Tuesday, September 07, 2010

Generating random numbers in R

Original link : http://blog.revolutionanalytics.com/2009/02/how-to-choose-a-random-number-in-r.html


Generate a random number between 5.0 and 7.5

If you want to generate a decimal number where any value (including fractional values) between the stated minimum and maximum is equally likely, use the runif function. This function generates values from the Uniform distribution. Here's how to generate one random number between 5.0 and 7.5:

> x1 <- runif(1, 5.0, 7.5)
> x1
[1] 6.715697

Of course, when you run this, you'll get a different number, but it will definitely be between 5.0 and 7.5. You won't get the values 5.0 or 7.5 exactly, either.

If you want to generate multiple random values, don't use a loop. You can generate several values at once by specifying the number of values you want as the first argument to runif. Here's how to generate 10 values between 5.0 and 7.5:

> x2 <- runif(10, 5.0, 7.5)
> x2
[1] 6.339188 5.311788 7.099009 5.746380 6.720383 7.433535 7.159988
[8] 5.047628 7.011670 7.030854

Generate a random integer between 1 and 10

This looks like the same exercise as the last one, but now we only want whole numbers, not fractional values. For that, we use the sample function:

> x3 <- sample(1:10, 1)
> x3
[1] 4

The first argument is a vector of valid numbers to generate (here, the numbers 1 to 10), and the second argument indicates one number should be returned. If we want to generate more than one random number, we have to add an additional argument to indicate that repeats are allowed:

> x4 <- sample(1:10, 5, replace=T)
> x4
[1] 6 9 7 6 5

Note the number 6 appears twice in the 5 numbers generated. (Here's a fun exercise: what is the probability of running this command and having no repeats in the 5 numbers generated?)

Select 6 random numbers between 1 and 40, without replacement

If you wanted to simulate the lotto game common to many countries, where you randomly select 6 balls from 40 (each labelled with a number from 1 to 40), you'd again use the sample function, but this time without replacement:

> x5 <- sample(1:40, 6, replace=F)
> x5
[1] 10 21 29 12 7 31

You'll get a different 6 numbers when you run this, but they'll all be between 1 and 40 (inclusive), and no number will repeat. Also, you don't actually need to include the replace=F option -- sampling without replacement is the default -- but it doesn't hurt to include it for clarity.

Select 10 items from a list of 50

You can use this same idea to generate a random subset of any vector, even one that doesn't contain numbers. For example, to select 10 distinct states of the US at random:

> sample(state.name, 10)
[1] "Virginia" "Oklahoma" "Maryland" "Michigan"
[5] "Alaska" "South Dakota" "Minnesota" "Idaho"
[9] "Indiana" "Connecticut"

You can't sample more values than you have without allowing replacements:

> sample(state.name, 52)
Error in sample(state.name, 52) :
cannot take a sample larger than the population when 'replace = FALSE'

... but sampling exactly the number you do have is a great way to randomize the order of a vector. Here are the 50 states of the US, in random order:

> sample(state.name, 50)
[1] "California" "Iowa" "Hawaii"
[4] "Montana" "South Dakota" "North Dakota"
[7] "Louisiana" "Maine" "Maryland"
[10] "New Hampshire" "Rhode Island" "Texas"
[13] "Florida" "North Carolina" "Minnesota"
[16] "Arkansas" "Pennsylvania" "Colorado"
[19] "Idaho" "Connecticut" "Utah"
[22] "South Carolina" "Illinois" "Ohio"
[25] "New Jersey" "Indiana" "Wisconsin"
[28] "Mississippi" "Michigan" "Wyoming"
[31] "West Virginia" "Alaska" "Georgia"
[34] "Vermont" "Virginia" "Oklahoma"
[37] "Washington" "New Mexico" "New York"
[40] "Delaware" "Nevada" "Alabama"
[43] "Kentucky" "Missouri" "Oregon"
[46] "Tennessee" "Arizona" "Massachusetts"
[49] "Kansas" "Nebraska"

You could also have just used sample(state.name) for the same result -- sampling as many values as provided is the default.

Further reading

For more information about how R generates random numbers, check out the following help pages:

> ?runif
> ?sample
> ?.Random.seed

The last of these provides technical detail on the random number generator R uses, and how you can set the random seed to recreate strings of random numbers.

No comments: