kafsemo.org

replicate in R

2011-12-14

I learned a new trick in R — replicate to repeat random number generation. I also learned that it’s not necessary to understand R to use it.

This was prompted by discussion of queueing theory, with code in R (“queue“). For a simpler case, suppose we’re investigating random distributions and seeing how many sixes we get when we roll a die ten times.

Function parameters in R are passed by value. This means that, while sum(sample(1:6, 10, replace='T') == 6) will give a random number, rep(sum(sample(1:6, 10, replace='T') == 6), 10) will take a sequence of ten rolls and then repeat it ten times — not what we want:

 [1] 2 2 2 2 2 2 2 2 2 2

I’d been using this technique to make each repetition pick a new number:

> sapply(1:10, function(x){sum(sample(seq(1,6), 10, replace='T') == 6)})
 [1] 1 0 1 0 1 1 0 1 2 0

Here, for each item in the initial list, the function is evaluated. R is also lazy — the expressions used as parameters aren’t evaluated until they’re used. To demonstrate:

> sideEffect <- function(s) {print(paste("Evaluated for", s)); s;}

> f <- function(x, y) {x}
> dummy <- f(sideEffect("First"), sideEffect("Second"))
[1] "Evaluated for First"

> f <- function(x, y) {c(x,y)}
> dummy <- f(sideEffect("First"), sideEffect("Second"))
[1] "Evaluated for First"
[1] "Evaluated for Second"

Here, the first function call only uses its first argument, so the second sideEffect is never called.

Together, that doesn’t help to write a function that evaluates its argument repeatedly. However, R is also flexible enough to allow redefining the arguments before evaluating them:

> replicate
function (n, expr, simplify = "array") 
sapply(integer(n), eval.parent(substitute(function(...) expr)), 
    simplify = simplify)
<environment: namespace:base>

Now, the expr argument is evaluated once each time it’s used. So, replicate(10, sum(sample(seq(1,6), 10, replace='T') == 6)) gives me exactly what I want.

(Music: AFI, “Fall Children”)
(More from this year, or the front page? [K])