Chapter 7 Standardized Argument Names
7.1 Dot Usage
If there is a possibility of argument name conflicts between the function and any arguments passed down through
...
, it is strongly suggested that the argument names to the main function be prefixed with a dot (e.g..data
,.x
, etc.)When defining the order of arguments in a function, try to keep the
...
as far to the left as possible to coerce users to explicitly name all arguments to the right of...
.
7.2 Data Arguments
na_rm
: missing data handling.new_data
: data to be predicted.weights
: case weights.For
.data.frame
methods:x
: predictors or generic data objects.y
: outcome data.
For
.formula
methods:formula
: ay ~ x
formula specifying the outcome and predictors.data
: the data.frame to pull formula variables from.
7.3 Numerical Arguments
times
: the number of bootstraps, simulations, or other replications.
7.4 Statistical Quantities
direction
: the type of hypothesis test alternative.level
: interval levels (e.g., confidence, credible, prediction, and so on).link
: link functions for generalized linear models.
7.5 Tuning Parameters
activation
: the type of activation function between network layers.cost
: a cost value for SVM models.Cp
: The cost-complexity parameter in classical CART models.deg_free
: a parameter for the degrees of freedom.degree
: the polynomial degree.dropout
: the parameter dropout rate.epochs
: the number of iterations of training.hidden_units
: the number of hidden units in a network layer.Laplace
: the Laplace correction used to smooth low-frequency counts.learn_rate
: the rate at which the boosting algorithm adapts from iteration-to-iteration.loss_reduction
: The reduction in the loss function required to split further.min_n
: The minimum number of data points in a node that are required for the node to be split further.mixture
: the proportion of L1 regularization in the model.mtry
: The number of predictors that will be randomly sampled at each split when creating the tree models.neighbors
: a parameter for the number of neighbors used in a prototype model.num_comp
: the number of components in a model (e.g. PCA or PLS components).num_terms
: a nonspecific parameter for the number of terms in a model. This can be used with models that include feature selection, such as MARS.prod_degree
: the number of terms to combine into interactions. A value of 1 implies an additive model. Useful for MARS models and some linear models.prune
: a logical for whether a tree or set of rules should be pruned.rbf_sigma
: the sigma parameters of a radial basis function.penalty
: The amount of regularization used. In cases where different penalty types require to be differentiated, the namesL1
andL2
are recommended.sample_size
: the size of the data set used for modeling within an iteration of the modeling algorithm, such as stochastic gradient boosting.surv_dist
: the statistical distribution of the data in a survival analysis model.tree_depth
: The maximum depth of the tree (i.e. number of splits).trees
: The number of trees contained in a random forest or boosted ensemble. In the latter case, this is equal to the number of boosting iterations.weight_func
: The type of kernel function that weights the distances between samples (e.g. in a K-nearest neighbors model).
7.6 Others
fn
andfns
when a single or multiple functions are passed as arguments (respectively).