Cross-validation for Gradient Boosted Trees
cv.gbt( object, K = 5, repeats = 1, params = list(), nrounds = 500, early_stopping_rounds = 10, nthread = 12, train = NULL, type = "classification", trace = TRUE, seed = 1234, maximize = NULL, fun, ... )
object | Object of type "gbt" or "ranger" |
---|---|
K | Number of cross validation passes to use (aka nfold) |
repeats | Repeated cross validation |
params | List of parameters (see XGBoost documentation) |
nrounds | Number of trees to create |
early_stopping_rounds | Early stopping rule |
nthread | Number of parallel threads to use. Defaults to 12 if available |
train | An optional xgb.DMatrix object containing the original training data. Not needed when using Radiant's gbt function |
type | Model type ("classification" or "regression") |
trace | Print progress |
seed | Random seed to use as the starting point |
maximize | When a custom function is used, xgb.cv requires the user indicate if the function output should be maximized (TRUE) or minimized (FALSE) |
fun | Function to use for model evaluation (i.e., auc for classification and RMSE for regression) |
... | Additional arguments to be passed to 'fun' |
A data.frame sorted by the mean of the performance metric
See https://radiant-rstats.github.io/docs/model/gbt.html for an example in Radiant
gbt
to generate an initial model that can be passed to cv.gbt
Rsq
to calculate an R-squared measure for a regression
RMSE
to calculate the Root Mean Squared Error for a regression
MAE
to calculate the Mean Absolute Error for a regression
auc
to calculate the area under the ROC curve for classification
profit
to calculate profits for classification at a cost/margin threshold
if (FALSE) { result <- gbt(dvd, "buy", c("coupon", "purch", "last")) cv.gbt(result, params = list(max_depth = 1:6)) cv.gbt(result, params = list(max_depth = 1:6), fun = "logloss") cv.gbt( result, params = list(learning_rate = seq(0.1, 1.0, 0.1)), maximize = TRUE, fun = profit, cost = 1, margin = 5 ) result <- gbt(diamonds, "price", c("carat", "color", "clarity"), type = "regression") cv.gbt(result, params = list(max_depth = 1:2, min_child_weight = 1:2)) cv.gbt(result, params = list(learning_rate = seq(0.1, 0.5, 0.1)), fun = Rsq, maximize = TRUE) cv.gbt(result, params = list(learning_rate = seq(0.1, 0.5, 0.1)), fun = MAE, maximize = FALSE) rig_wrap <- function(preds, dtrain) { labels <- xgboost::getinfo(dtrain, "label") value <- rig(preds, labels, lev = 1) list(metric = "rig", value = value) } result <- gbt(titanic, "survived", c("pclass", "sex"), eval_metric = rig_wrap, maximize = TRUE) cv.gbt(result, params = list(learning_rate = seq(0.1, 0.5, 0.1))) }