A few thoughts about pipes in R
August 25, 2023
R tips pipe internals
base
magrittr
dplyr
Using pipes in R makes code cleaner. There are two good options for pipes:
magrittr
which brings the forward pipe %>%
alongside four other pipes and the native pipe |>
introduce in R 4.1.0. I have been piping for years, I started with the pipes from the magrittr
package, now I use the native pipe.'
Table of Contents
Piping in R
Back in 2014, I discovered the forward pipe for R introduced in magrittr
and since that time, I never stopped piping, although my piping habits evolved over time, especially with the introduction of the native pipe in R 4.1.
When I started using pipes in R, I had some experience with the bash pipe, |
, which basically passes the output of a function to the input of a second one, and so allow to do chaining operations. But using pipes in R was a major breakthrough: with a simple infix operator, lines of code involving a collection of nested function calls were suddenly turned into one readable data recipe. Let’s take an example and create a data pipeline where we apply a statistical model, model_1()
, to the data set data_1
after two steps of data preprocessing, transform_1()
and transform_2()
. Here is the code without pipes:
|
|
For now, let’s refer to the forward pipe as %pipe%
. With pipe, one would write
|
|
as
|
|
data_1
is the left hand side of the pipe (often abbreviated lhs) and transform_2()
is its right hand side (lhs). Let’s rewrite the code using it:
|
|
There are two main facts, one quickly grasps when looking at the two blocks of code:
- with pipes, you stop reading backwards:
data_1
is at the beginning of the block, andmodel_1
is now at the last line, not at the first line; - with pipes, it is easier to deal with parentheses, the code is more readable.
A little more subtle is that is it easier to comment out parts of the recipe. Say we need to comment out transform_2()
, without pipes, we would do something like this:
|
|
whereas with pipes, the code would look something like that:
|
|
This is not a major concern here, but it does help in more complex cases. Having code easy to read and easy to manipulate is particularly relevant for the R community because we are a group of data recipe writers and our recipes may include tens of steps. It is thus no surprise that the community quickly adopted the magrittr’s pipe and tidyverse tremendously helped in popularizing the use of pipes in the community (e.g. see commit 89aaa9a8b
of dplyr
on April 14th, 2014 where the magrittr pipe was adopted).
magrittr pipes
A package of internals
I am assuming that most of R users are familiar with the forward pipe %>%
and its placeholder .
– the symbols representing the object being forwarded. I am also assuming that most of R users have used it through the meta package tidyverse
, or one of the packages included, most likely dplyr
. So here, instead of focusing on how to use the forward pipe, I would like to mention a few technical details as well as the other pipes magrittr
includes.
magrittr
is a package that brings the forward pipe %>%
aklong with four other pipes: %<>%
, %$%
, %!>%
, %T>%
as well as a several functions that can be used with the pipe.
If you look at the source code, pipes are defined in pipe.R and for instance the following lines defined the forward pipe (see pipe.R L130-L137):
|
|
the last line is a call to the primitive .External2()
that will call an external C function, magrittr_pipe()
, that is an R internal structure (a SEXP, see Rinternals.h). If you go to the folder source you will find the lines in
pipe.c that define magrittr_pipe()
:
|
|
Hence, when you load magrittr
, you are using new internal functions including the forward pipe along with four others pipes:
%<>%
: the assignment pipe%$%
: the exposition pipe%!>%
: the eager pipe%T>%
: the Tee pipe
The four additional pipes
Let’s load magrittr
and let me give an example for these pipes. The assignment pipe allows you to assign the value while piping. Here is a example where you would use two steps in your data pipeline to modify the dataset CO2.
|
|
With the assignment pipe you can use CO2b %<>%
instead of CO2b <- CO2b
|
|
With the Tee pipe, you can call a function without including the output in the pipeline while retaining the side effect of the function. This van be very handy for print and plot functions.
|
|
The exposition pipe exposes the names of the lhs of the pipe so they can be called from the rhs of the pipe.
|
|
Finally, the eager pipe overcomes the lazy evaluation behaviour in R which is beyond the scope of this post, but if you are curious, have a look at this great blog post by Collin Fay. So here is an example where two functions that prompt a message on evaluation (pay attention to the order of the messages):
|
|
The native pipe
R 4.1 introduced the native pipe |>
. In the NEWS file, section CHANGES IN R 4.1.0, the pipe was announced with the following message :
R now provides a simple native forward pipe syntax |>. The simple form of the forward pipe inserts the left-hand side as the first argument in the right-hand side call. The pipe implementation as a syntax transformation was motivated by suggestions from Jim Hester and Lionel Henry.
Several blog posts have explained how to use it (e.g. this blog post on Towards data science). I was curious about the changes that come with such new feature, so I did a quick search in the source code (using the mirror available on GitHub at https://github.com/wch/r-source):
|
|
I then checked the files modified in commit a1425adea5
:
|
|
Commit a1425adea5
introduced the pipe and 7 files were modified to do so. Note that git show
one can quickly check all the changes:
|
|
and with -- path
one may focus on a specific file:
|
|
Investigating further, I found that the symbol |>
is declared in names.c
and that xxpipe()
is basically the definition of the pipe (see gram.y
).
|
|
And this is the code that makes the pipe working.
|
|
Though I don’t see any advantages of doing so, it is possible to combine the two pipes.
|
|
Even if the code of xxpipe()
has changed in R 4.3.2 and 4.3.3, we can still demonstrate how to hit the two errors captured in the code (though the error messages are slightly different):
|
|
|
|
One of the main reason for the changes in the xxpipe()
code is the recent introduction of the placeholder, _
, in R 4.2.0 (see CHANGES IN R 4.2.0 in the NEWS file):
In a forward pipe |> expression it is now possible to use a named argument with the placeholder _ in the rhs call to specify where the lhs is to be inserted. The placeholder can only appear once on the rhs.
Placeholder that was recently updated (see CHANGES IN R 4.3.0 NEWS file):
As an experimental feature the placeholder _ can now also be used in the rhs of a forward pipe |> expression as the first argument in an extraction call, such as _$coef. More generally, it can be used as the head of a chain of extractions, such as _$coef[[2]].
So here is an example of what can be done in R>4.3.0.
|
|
When coding in the console I have been using more and more frequently the operator ->
, that really comes in handy. You may be aware that R has multiple assignment operators, for historical reasons as mentioned by (Chambers, 2016) (page 73 in a footnote):
The specific choice of
<-
dates back to the first version of S. We chose it in order to emphasize that the left and right operands are entirely different: a target on the left and a computed value on the right. Later versions of S and R accept the=
operator, but for exposition the original choice remains clearer.
R users mainly use <-
and sometimes =
(frequently used by users that have experience with other programming languages), but ->
feels especially appropriate when piping as it concludes well the data recipe.
|
|
Final remarks
In this post I wrote down a few thoughts about the native pipe and the magrittr pipes. I did not attempt to compare the two. The main differences between the two pipes %>%
and |>
are summarized in one of Hadley Wickham recent blog post. Of course there is only one native pipe and 5 magrittr pipes. But the most significative difference regards the placeholders, .
coming with more features than _
. That said, given the latest experimental feature of _
in R 4.3.0, this may soon no longer hold true.
The tidyverse community still recommends the use of the magrittr forward pipe, and even provides a styling guide for it: https://style.tidyverse.org/pipes.html. I now code almost exclusively with the native pipe (following the same guidelines). I apply it everywhere and I have a bunch of shortcuts for function I use the most frequently in the console, e.g. to do |> names()
or |> class()
. I prefer |>
mostly because it is native, meaning that I don’t need to load any package to use it. There are two additional minor pros: it is only two characters and Julia – which I use frequently – uses the same symbol. I even use it in packages and as long I not using the placeholder, the package only requires a version >= R 4.3.1 which is not bad (the oldrel is currently R 4.2.3).
References
Chambers JM. 2016. Extending R. Boca Raton London New York: CRC Press.
Display information relative to the R session used to render this post.
|
|