What You May Take for Granted: The Difference Between & and && in R

After reading the article, which one do you think represents a “&” and a “&&”?

If you’ve been using R for a while now, you may have come across the double “&” operator. Most people who’ve coded before, whether in R or some other language, have an intuitive feel for what the “&” represents. It’s a logical AND statement. “The sky is blue AND cows can fly” is a logically false statement because even though the sky is blue, the second part of the statement is false. So what the heck then does a “&&” represent?

If you look up the help page, using?"&&", you will read “& and && indicate logical AND…The shorter form performs elementwise comparisons in much the same way as arithmetic operators. The longer form evaluates left to right examining only the first element of each vector.” What does this mean? Let’s do a quick illustrative example.

This is what you should see in your console. See how the “&&” operator returned a logical vector of length 1; it only looked at the first element, which in this case was “1” and it evaluated to false (it failed the first condition of x>1). We call this behavior in programming lazy evaluation.

So, why is this even useful? Simply put, it’s a matter of efficiency. Let’s use a trivial example where suppose we have a vector of one billion elements. Note: the invisible()function suppresses printed output — trust me, you don’t want one billion things getting printed onto your console.

On my machine, the “&” statement took 37 seconds whereas the “&&” statement took 12.9 seconds. It’s much much faster. This speed becomes increasingly important in the functions that we R users use in our daily lives. Oftentimes functions that are well-maintained do a lot of internal checking and testing, so that when we give them inputs that don’t make sense, the functions will stop and try to give us a (hopefully) helpful error message. And if we’re using higher-dimensional inputs like vectors and data frames as opposed to single numbers, it only takes one element of a vector or one column/row of a dataframe to break something, so as long we catch it once, there’s no need to check every single element. It’d be too slow.

As a case study, let’s take a look under-the-hood at a function I was using lately: gtsummary::tbl_summary(). This is a tangent, but gtsummary::tbl_summary() is a useful function that outputs a table of descriptive statistics, much like the types of descriptive tables you’d see in published papers (the “Table 1's”), and it notably meshes well within the tidyverse. To look at the source code of a function, use getAnywhere(). When we run getAnywhere(tbl_summary)we get a whole bunch of code, and within, I want to highlight a particular example of “&&”.

This statement basically reads “if by is not empty and the total number of NAs in the column subsetted with by is greater than 0…” Here, I actually don’t believe the “&&” is strictly necessary. The reason is that is.null()and sum()> 0both return single-element vectors. So, an elementwise comparison is the same thing as comparing the first elements of vectors if the vectors are only of length 1. Nevertheless, using “&&” or the complementary “||” (the OR operator) is good practice when writing functions, especially those that you’re writing for the greater community because you always want to ensure that your functions are running quickly and efficiently.

Data Scientist at Merck. Tidyverse enthusiast and a neRd.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store