Wednesday, April 21, 2010

extracting a percentage of data by random by groups

1) Randomly choose 10% of data from each "age" group.

> x <- data.frame(group=sample(1:4,100,TRUE), age=runif(100,4,80))
> tapply(x$age, x$group, function(z) mean(z[sample(seq_along(z), length(z) / 10)]))


2) To split my dataset randomly into 2 parts: a prediction set (with 2/3 of my data) and a validation set (with 1/3 of my data).

> x <- 1:100 # test data
> y <- split(x, sample(1:2, length(x), replace=TRUE, prob=c(1,2)))

3) I would like to randomly divide this data frame in half. how to select those rows that were not selected and assign them to randomsample2

selected<-rep(0,39622)
selected[sample(1:39622,39622/2)]<-1
data$selected<-selected
rm(selected)
or
data$selected<-rbinom(39622,1,.5)

No comments: