Trick or Tips 002 {R}

November 12, 2017

  R tips trickortips
  base utils graphics magrittr raster knitr

Kevin Cazelles

   

Trick or Tips

Ever tumbled on a code chunk that made you say: "I should have known this f_ piece of code long ago!" Chances are you have, frustratingly, just like we have, and on multiple occasions too. In comes Trick or Tips!

Trick or Tips is a series of blog posts that each present 5 -- hopefully helpful -- coding tips for a specific programming language. Posts should be short (i.e. no more than 5 lines of code, max 80 characters per line, except when appropriate) and provide tips of many kind: a function, a way of combining of functions, a single argument, a note about the philosophy of the language and practical consequences, tricks to improve the way you code, good practices, etc.

Note that while some tips might be obvious for careful documentation readers (God bless them for their wisdom), we do our best to present what we find very useful and underestimated. By the way, there are undoubtedly similar initiatives on the web (e.g. "One R Tip a Day" Twitter account). Last, feel free to comment below tip ideas or a post of code tips of your own which we will be happy to incorporate to our next post.

Enjoy and get ready to frustratingly appreciate our tips!


The drop argument of the [] operator

This is something not obvious and poorly known but there is a logical argument drop that can be passed to the [] operator and I’ll try to explain why it could be useful! Let’s first create a data frame with ten rows and three columns:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
df <- data.frame(
  var1 = runif(10),
  var2 = runif(10),
  var3 = runif(10)
)
head(df)
#R>          var1      var2       var3
#R>  1 0.09664318 0.1987792 0.11771019
#R>  2 0.88404551 0.5546171 0.79025426
#R>  3 0.33420562 0.6793993 0.02752512
#R>  4 0.50591199 0.7024577 0.88804752
#R>  5 0.46855520 0.5521233 0.12339402
#R>  6 0.30723630 0.1160944 0.52954148

To extract the first column, I use the [] operator and either type the number of the column like so:

1
2
3
df[, 1]
#R>   [1] 0.09664318 0.88404551 0.33420562 0.50591199 0.46855520 0.30723630 0.62386364 0.72825928
#R>   [9] 0.66597937 0.51276290

or the name of the column to be extracted:

1
2
3
df[, 'var1']
#R>   [1] 0.09664318 0.88404551 0.33420562 0.50591199 0.46855520 0.30723630 0.62386364 0.72825928
#R>   [9] 0.66597937 0.51276290

Interestingly enough, this returns a vector, not a data.frame

1
2
3
4
class(df)
#R>  [1] "data.frame"
class(df[, 'var1'])
#R>  [1] "numeric"

while if I extract two columns, I have a data frame:

1
2
class(df[, c('var1', 'var2')])
#R>  [1] "data.frame"

This behavior is actually very useful in many cases as we often are happy to deal with a vector when we extract only one column. However this might become an issue when we do extractions without knowing the number of columns to be extracted beforehand (typically when extracting according to a request that can give any number of columns). In such case if the number is one then we end up with a vector instead of a data.frame. The argument drop provides a work around. By default it is set to TRUE and a 1-column data frame becomes a vector, but using drop = FALSE prevents this from happening. Let’s try this:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
df[, 1, drop = FALSE]
#R>           var1
#R>  1  0.09664318
#R>  2  0.88404551
#R>  3  0.33420562
#R>  4  0.50591199
#R>  5  0.46855520
#R>  6  0.30723630
#R>  7  0.62386364
#R>  8  0.72825928
#R>  9  0.66597937
#R>  10 0.51276290

Let’s check its class:

1
2
class(df[, 1, drop = FALSE])
#R>  [1] "data.frame"

You can actually obtain the same result using the name of the column or its number without comma (a data frame is a list of vector which have the same size, so you can basically subset the list!).

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
df[1]
#R>           var1
#R>  1  0.09664318
#R>  2  0.88404551
#R>  3  0.33420562
#R>  4  0.50591199
#R>  5  0.46855520
#R>  6  0.30723630
#R>  7  0.62386364
#R>  8  0.72825928
#R>  9  0.66597937
#R>  10 0.51276290

But if you need a specific selection of rows, you better use drop!

1
2
3
4
5
6
df[2:5, 1, drop = FALSE]
#R>         var1
#R>  2 0.8840455
#R>  3 0.3342056
#R>  4 0.5059120
#R>  5 0.4685552

Now you know ;-)


Get the citation of a package

Many researchers (it is especially TRUE in ecology) uses R and write paper and carry out analyses using R for their research. One cones the time of citing the package I guess they wonder how to cite the package. However authors of package actually provides this information in their package! Let’s have a look of the reference for the package knitr as of version 1.17 using function citation

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
citation("knitr")
#R>  To cite package 'knitr' in publications use:
#R>  
#R>    Xie Y (2024). _knitr: A General-Purpose Package for Dynamic Report Generation in R_. R
#R>    package version 1.49, <https://yihui.org/knitr/>.
#R>  
#R>    Yihui Xie (2015) Dynamic Documents with R and knitr. 2nd edition. Chapman and Hall/CRC.
#R>    ISBN 978-1498716963
#R>  
#R>    Yihui Xie (2014) knitr: A Comprehensive Tool for Reproducible Research in R. In
#R>    Victoria Stodden, Friedrich Leisch and Roger D. Peng, editors, Implementing
#R>    Reproducible Computational Research. Chapman and Hall/CRC. ISBN 978-1466561595
#R>  
#R>  To see these entries in BibTeX format, use 'print(<citation>, bibtex=TRUE)',
#R>  'toBibtex(.)', or set 'options(citation.bibtex.max=999)'.

As suggested in the message, we can even retrieve a reference list in bibtex format with the toBibtex function, let’s do this:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
toBibtex(citation("knitr"))
#R>  @Manual{,
#R>    title = {knitr: A General-Purpose Package for Dynamic Report Generation in R},
#R>    author = {Yihui Xie},
#R>    year = {2024},
#R>    note = {R package version 1.49},
#R>    url = {https://yihui.org/knitr/},
#R>  }
#R>  
#R>  @Book{,
#R>    title = {Dynamic Documents with {R} and knitr},
#R>    author = {Yihui Xie},
#R>    publisher = {Chapman and Hall/CRC},
#R>    address = {Boca Raton, Florida},
#R>    year = {2015},
#R>    edition = {2nd},
#R>    note = {ISBN 978-1498716963},
#R>    url = {https://yihui.org/knitr/},
#R>  }
#R>  
#R>  @InCollection{,
#R>    booktitle = {Implementing Reproducible Computational Research},
#R>    editor = {Victoria Stodden and Friedrich Leisch and Roger D. Peng},
#R>    title = {knitr: A Comprehensive Tool for Reproducible Research in {R}},
#R>    author = {Yihui Xie},
#R>    publisher = {Chapman and Hall/CRC},
#R>    year = {2014},
#R>    note = {ISBN 978-1466561595},
#R>  }

Even if you are no a Latex user, this could be very helpful as this file can be read by a references management software such as Zotero. So now let’s say I use the following command line:

1
cat(toBibtex(citation("knitr")), file='biblio.bib', sep='\n')

Then the biblio.bib file just created can be imported in your favorite references manager software.


Using namespace

In R, functions are stored in packages and adding a package is like adding a collection of functions. As you get more experienced with R you likely know and use more and more packages. You might even come to the point where you have functions that have the same name but originate from different package. If not, let me show you something:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
library(magrittr)
df <- data.frame(
  var1 = runif(10),
  var2 = runif(10)
  )
extract(df, 'var1')
#R>          var1
#R>  1  0.1002149
#R>  2  0.6652283
#R>  3  0.2397889
#R>  4  0.2705796
#R>  5  0.9057282
#R>  6  0.5248083
#R>  7  0.3941103
#R>  8  0.4800691
#R>  9  0.3214861
#R>  10 0.8405378

Here I use the function extract() from the magrittr package that act as [] and I extract the column var1 from df. This function is actually designed to be use with pipes (if this sounds weird, have a look at the magrittr package), for instance when piping you can write df %$% extract(var1) or even df %>% '['('var1') and this will do the same. So far, so good. Now I load the raster package:

1
2
library(raster)
#R>  Loading required package: sp

and try the same extraction.

1
2
extract(df, 'var1')
#R>  Error: unable to find an inherited method for function 'extract' for signature 'x = "data.frame", y = "character"'

It does not work…Why?? Briefly, extract() from raster is now called (this was the warning message on load said) and it does not get well with data.frame (this is the meaning of the error message). To overcome this you can use a explicit namespace. To do so you put the names of the package followed by ::, this is basically the unique identifier of the function. Indeed, within a specific package, functions have different names and on CRAN packages must have different names, so the combination of the two is unique (this holds true if you only package from the CRAN). Let’s use it:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
magrittr::extract(df, 'var1')
#R>          var1
#R>  1  0.1002149
#R>  2  0.6652283
#R>  3  0.2397889
#R>  4  0.2705796
#R>  5  0.9057282
#R>  6  0.5248083
#R>  7  0.3941103
#R>  8  0.4800691
#R>  9  0.3214861
#R>  10 0.8405378

Using this is also very helpful when you develop a package and functions from different packages. Even if you script and use a large number of function from various packages, it could be better to remember from which package functions come from. Finally, note that this is not R specific at all, actually this something very common in programming languages.


How to use non-exported functions?

Packages often contain functions that are not exported. There are often functions called by the functions exported that helps structuring the code of the package. However, it happens that when you try to understand how a package work you may want to spend some time understanding how they do work (especially given that they are nit documented). There is actually a way to call them! Instead of using tow colons (:), use three! Let’s have a look to the code of one of this function from the knitr package (again version 1.17):

knitr:::.color.block

Interesting, isn’t it! To give you an idea about how frequent this can be, in this packages there are 103 exported functions and 425 not-exported. Below are presented few examples of exported functions followed by not-exported ones.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
##------------------------ Exported functions
## knitr::pat_rnw             knitr::fig_path            knitr::all_patterns
## knitr::fig_chunk           knitr::clean_cache         knitr::kable
## knitr::knit_params_yaml    knitr::raw_output          knitr::render_sweave
## knitr::stitch_rhtml        knitr::include_graphics    knitr::Sweave2knitr
## knitr::hook_plot_asciidoc  knitr::hook_optipng        knitr::hook_plot_tex
## knitr::knit_print          knitr::knit_watch          knitr::knit2html
## knitr::render_html         knitr::knit2wp             knitr::rocco
## knitr::opts_template       knitr::normal_print        knitr::include_url
## knitr::combine_words       knitr::render_listings     knitr::current_input
##------------------------ (27/103 displayed)
##------------------------
##------------------------ Not-exported functions
## knitr:::.__NAMESPACE__.            knitr:::knit_expand
## knitr:::.__S3MethodsTable__.       knitr:::knit_filter
## knitr:::.base.pkgs                 knitr:::knit_global
## knitr:::.chunk.hook.html           knitr:::knit_handlers
## knitr:::.chunk.hook.tex            knitr:::knit_hooks
## knitr:::.color.block               knitr:::knit_log
## knitr:::.default.hooks             knitr:::knit_meta
## knitr:::.fmt.pat                   knitr:::knit_meta_add
## knitr:::.header.framed             knitr:::knit_params
## knitr:::.header.hi.html            knitr:::knit_params_handlers
## knitr:::.header.hi.tex             knitr:::knit_params_yaml
## knitr:::.header.maxwidth           knitr:::knit_patterns
## knitr:::.header.sweave.cmd         knitr:::knit_print
## knitr:::.img.attr                  knitr:::knit_print.default
##------------------------ (28/425 displayed)

I think that this could be very helpful when you want to understand exactly how a package works!


The las argument of par()

I really enjoy using graphics to create plots in R. That being said the default values always puzzles me! One I specially dislike is that values on the y-axis are perpendicular…

1
2
vec <- runif(10)
plot(vec)

Fortunately this can readily be changed using the the las argument of the par() function which can take 3 values: 0 (default), 1 or 2. Let’s plot and see the differences:

1
2
3
4
5
6
par(mfrow = c(1,3), las = 0)
plot(vec, main = 'las = 0 (default)')
par(las = 1)
plot(vec, main = 'las=1')
par(las = 2)
plot(vec, main = 'las=2')

So, I personally prefer and use las=1!


That’s all for number 2 of this series, see you for the next tips!

Display information relative to the R session used to render this post.
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
sessionInfo()
#R>  R version 4.4.2 (2024-10-31)
#R>  Platform: x86_64-pc-linux-gnu
#R>  Running under: Ubuntu 22.04.5 LTS
#R>  
#R>  Matrix products: default
#R>  BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 
#R>  LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.20.so;  LAPACK version 3.10.0
#R>  
#R>  locale:
#R>   [1] LC_CTYPE=C.UTF-8       LC_NUMERIC=C           LC_TIME=C.UTF-8        LC_COLLATE=C.UTF-8    
#R>   [5] LC_MONETARY=C.UTF-8    LC_MESSAGES=C.UTF-8    LC_PAPER=C.UTF-8       LC_NAME=C             
#R>   [9] LC_ADDRESS=C           LC_TELEPHONE=C         LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C   
#R>  
#R>  time zone: UTC
#R>  tzcode source: system (glibc)
#R>  
#R>  attached base packages:
#R>  [1] stats     graphics  grDevices utils     datasets  methods   base     
#R>  
#R>  other attached packages:
#R>  [1] raster_3.6-30     sp_2.1-4          magrittr_2.0.3    inSilecoRef_0.1.1
#R>  
#R>  loaded via a namespace (and not attached):
#R>   [1] sass_0.4.9        generics_0.1.3    xml2_1.3.6        lattice_0.22-6    blogdown_1.19    
#R>   [6] stringi_1.8.4     httpcode_0.3.0    digest_0.6.37     grid_4.4.2        evaluate_1.0.1   
#R>  [11] bookdown_0.41     fastmap_1.2.0     plyr_1.8.9        jsonlite_1.8.9    backports_1.5.0  
#R>  [16] crul_1.5.0        promises_1.3.2    codetools_0.2-20  bibtex_0.5.1      jquerylib_0.1.4  
#R>  [21] cli_3.6.3         shiny_1.10.0      rlang_1.1.4       cachem_1.1.0      yaml_2.3.10      
#R>  [26] tools_4.4.2       dplyr_1.1.4       httpuv_1.6.15     DT_0.33           rcrossref_1.2.0  
#R>  [31] curl_6.0.1        vctrs_0.6.5       R6_2.5.1          mime_0.12         lifecycle_1.0.4  
#R>  [36] stringr_1.5.1     fs_1.6.5          htmlwidgets_1.6.4 miniUI_0.1.1.1    pkgconfig_2.0.3  
#R>  [41] terra_1.8-5       pillar_1.10.0     bslib_0.8.0       later_1.4.1       glue_1.8.0       
#R>  [46] Rcpp_1.0.13-1     systemfonts_1.1.0 xfun_0.49         tibble_3.2.1      tidyselect_1.2.1 
#R>  [51] knitr_1.49        xtable_1.8-4      htmltools_0.5.8.1 svglite_2.1.3     rmarkdown_2.29   
#R>  [56] compiler_4.4.2

Edits

Apr 23, 2022 -- Fix typos.
May 24, 2022 -- Add session info section.
Feb 4, 2023 -- Edit headers.