rand_forest() is a way to generate a specification of a model before fitting and allows the model to be created using different packages in R or via Spark. The main arguments for the model are:

  • mtry: The number of predictors that will be randomly sampled at each split when creating the tree models.

  • trees: The number of trees contained in the ensemble.

  • min_n: The minimum number of data points in a node that are required for the node to be split further.

These arguments are converted to their specific names at the time that the model is fit. Other options and argument can be set using set_engine(). If left to their defaults here (NULL), the values are taken from the underlying model functions. If parameters need to be modified, update() can be used in lieu of recreating the object from scratch.

rand_forest(mode = "unknown", mtry = NULL, trees = NULL,
  min_n = NULL)

# S3 method for rand_forest
update(object, mtry = NULL, trees = NULL,
  min_n = NULL, fresh = FALSE, ...)

Arguments

mode

A single character string for the type of model. Possible values for this model are "unknown", "regression", or "classification".

mtry

An integer for the number of predictors that will be randomly sampled at each split when creating the tree models.

trees

An integer for the number of trees contained in the ensemble.

min_n

An integer for the minimum number of data points in a node that are required for the node to be split further.

object

A random forest model specification.

fresh

A logical for whether the arguments should be modified in-place of or replaced wholesale.

...

Not used for update().

Details

The model can be created using the fit() function using the following engines:

  • R: "ranger" or "randomForest"

  • Spark: "spark"

Note

For models created using the spark engine, there are several differences to consider. First, only the formula interface to via fit() is available; using fit_xy() will generate an error. Second, the predictions will always be in a spark table format. The names will be the same as documented but without the dots. Third, there is no equivalent to factor columns in spark tables so class predictions are returned as character columns. Fourth, to retain the model object for a new R session (via save), the model$fit element of the parsnip object should be serialized via ml_save(object$fit) and separately saved to disk. In a new session, the object can be reloaded and reattached to the parsnip object.

Engine Details

Engines may have pre-set default arguments when executing the model fit call. For this type of model, the template of the fit calls are::

ranger classification

ranger::ranger(formula = missing_arg(), data = missing_arg(), 
    case.weights = missing_arg(), num.threads = 1, verbose = FALSE, 
    seed = sample.int(10^5, 1), probability = TRUE)

ranger regression

ranger::ranger(formula = missing_arg(), data = missing_arg(), 
    case.weights = missing_arg(), num.threads = 1, verbose = FALSE, 
    seed = sample.int(10^5, 1))

randomForests classification

randomForest::randomForest(x = missing_arg(), y = missing_arg())

randomForests regression

randomForest::randomForest(x = missing_arg(), y = missing_arg())

spark classification

sparklyr::ml_random_forest(x = missing_arg(), formula = missing_arg(), 
    type = "classification", seed = sample.int(10^5, 1))

spark regression

sparklyr::ml_random_forest(x = missing_arg(), formula = missing_arg(), 
    type = "regression", seed = sample.int(10^5, 1))

For ranger confidence intervals, the intervals are constructed using the form estimate +/- z * std_error. For classification probabilities, these values can fall outside of [0, 1] and will be coerced to be in this range.

See also

Examples

rand_forest(mode = "classification", trees = 2000)
#> Random Forest Model Specification (classification) #> #> Main Arguments: #> trees = 2000 #>
# Parameters can be represented by a placeholder: rand_forest(mode = "regression", mtry = varying())
#> Random Forest Model Specification (regression) #> #> Main Arguments: #> mtry = varying() #>
model <- rand_forest(mtry = 10, min_n = 3) model
#> Random Forest Model Specification (unknown) #> #> Main Arguments: #> mtry = 10 #> min_n = 3 #>
update(model, mtry = 1)
#> Random Forest Model Specification (unknown) #> #> Main Arguments: #> mtry = 1 #> min_n = 3 #>
update(model, mtry = 1, fresh = TRUE)
#> Random Forest Model Specification (unknown) #> #> Main Arguments: #> mtry = 1 #>