Cross-validation for Gradient Boosted Trees

cv.gbt(
  object,
  K = 5,
  repeats = 1,
  params = list(),
  nrounds = 500,
  early_stopping_rounds = 10,
  nthread = 12,
  train = NULL,
  type = "classification",
  trace = TRUE,
  seed = 1234,
  maximize = NULL,
  fun,
  ...
)

Arguments

object

Object of type "gbt" or "ranger"

K

Number of cross validation passes to use (aka nfold)

repeats

Repeated cross validation

params

List of parameters (see XGBoost documentation)

nrounds

Number of trees to create

early_stopping_rounds

Early stopping rule

nthread

Number of parallel threads to use. Defaults to 12 if available

train

An optional xgb.DMatrix object containing the original training data. Not needed when using Radiant's gbt function

type

Model type ("classification" or "regression")

trace

Print progress

seed

Random seed to use as the starting point

maximize

When a custom function is used, xgb.cv requires the user indicate if the function output should be maximized (TRUE) or minimized (FALSE)

fun

Function to use for model evaluation (i.e., auc for classification and RMSE for regression)

...

Additional arguments to be passed to 'fun'

Value

A data.frame sorted by the mean of the performance metric

Details

See https://radiant-rstats.github.io/docs/model/gbt.html for an example in Radiant

See also

gbt to generate an initial model that can be passed to cv.gbt

Rsq to calculate an R-squared measure for a regression

RMSE to calculate the Root Mean Squared Error for a regression

MAE to calculate the Mean Absolute Error for a regression

auc to calculate the area under the ROC curve for classification

profit to calculate profits for classification at a cost/margin threshold

Examples

if (FALSE) { result <- gbt(dvd, "buy", c("coupon", "purch", "last")) cv.gbt(result, params = list(max_depth = 1:6)) cv.gbt(result, params = list(max_depth = 1:6), fun = "logloss") cv.gbt( result, params = list(learning_rate = seq(0.1, 1.0, 0.1)), maximize = TRUE, fun = profit, cost = 1, margin = 5 ) result <- gbt(diamonds, "price", c("carat", "color", "clarity"), type = "regression") cv.gbt(result, params = list(max_depth = 1:2, min_child_weight = 1:2)) cv.gbt(result, params = list(learning_rate = seq(0.1, 0.5, 0.1)), fun = Rsq, maximize = TRUE) cv.gbt(result, params = list(learning_rate = seq(0.1, 0.5, 0.1)), fun = MAE, maximize = FALSE) rig_wrap <- function(preds, dtrain) { labels <- xgboost::getinfo(dtrain, "label") value <- rig(preds, labels, lev = 1) list(metric = "rig", value = value) } result <- gbt(titanic, "survived", c("pclass", "sex"), eval_metric = rig_wrap, maximize = TRUE) cv.gbt(result, params = list(learning_rate = seq(0.1, 0.5, 0.1))) }