Cross-validation for a Random Forest

cv.rforest(
object,
K = 5,
repeats = 1,
mtry = 1:5,
num.trees = NULL,
min.node.size = 1,
sample.fraction = NA,
trace = TRUE,
seed = 1234,
fun,
...
)

## Arguments

object Object of type "rforest" or "ranger" Number of cross validation passes to use Repeated cross validation Number of variables to possibly split at in each node. Default is the (rounded down) square root of the number variables Number of trees to create Minimal node size Fraction of observations to sample. Default is 1 for sampling with replacement and 0.632 for sampling without replacement Print progress Random seed to use as the starting point Function to use for model evaluation (i.e., auc for classification and RMSE for regression) Additional arguments to be passed to 'fun'

## Value

A data.frame sorted by the mean of the performance metric

## Details

rforest to generate an initial model that can be passed to cv.rforest

Rsq to calculate an R-squared measure for a regression

RMSE to calculate the Root Mean Squared Error for a regression

MAE to calculate the Mean Absolute Error for a regression

auc to calculate the area under the ROC curve for classification

profit to calculate profits for classification at a cost/margin threshold

## Examples

if (FALSE) {
result <- rforest(dvd, "buy", c("coupon", "purch", "last"))
cv.rforest(
result, mtry = 1:3, min.node.size = seq(1, 10, 5),
num.trees = c(100, 200), sample.fraction = 0.632
)
result <- rforest(titanic, "survived", c("pclass", "sex"), max.depth = 1)
cv.rforest(result, mtry = 1:3, min.node.size = seq(1, 10, 5))
cv.rforest(result, mtry = 1:3, num.trees = c(100, 200), fun = profit, cost = 1, margin = 5)
result <- rforest(diamonds, "price", c("carat", "color", "clarity"), type = "regression")
cv.rforest(result, mtry = 1:3, min.node.size = 1)
cv.rforest(result, mtry = 1:3, min.node.size = 1, fun = Rsq)
}