Chapter 7 Standardized Argument Names

7.1 Dot Usage

If there is a possibility of argument name conflicts between the function and any arguments passed down through ..., it is strongly suggested that the argument names to the main function be prefixed with a dot (e.g. .data, .x, etc.)
When defining the order of arguments in a function, try to keep the ... as far to the left as possible to coerce users to explicitly name all arguments to the right of ....

7.2 Data Arguments

na_rm: missing data handling.
new_data: data to be predicted.
weights: case weights.
For .data.frame methods:
- x: predictors or generic data objects.
- y: outcome data.
For .formula methods:
- formula: a y ~ x formula specifying the outcome and predictors.
- data: the data.frame to pull formula variables from.

7.3 Numerical Arguments

times: the number of bootstraps, simulations, or other replications.

7.4 Statistical Quantities

direction: the type of hypothesis test alternative.
level: interval levels (e.g., confidence, credible, prediction, and so on).
link: link functions for generalized linear models.

7.5 Tuning Parameters

activation: the type of activation function between network layers.
cost: a cost value for SVM models.
Cp: The cost-complexity parameter in classical CART models.
deg_free: a parameter for the degrees of freedom.
degree: the polynomial degree.
dropout: the parameter dropout rate.
epochs: the number of iterations of training.
hidden_units: the number of hidden units in a network layer.
Laplace: the Laplace correction used to smooth low-frequency counts.
learn_rate: the rate at which the boosting algorithm adapts from iteration-to-iteration.
loss_reduction: The reduction in the loss function required to split further.
min_n: The minimum number of data points in a node that are required for the node to be split further.
mixture: the proportion of L1 regularization in the model.
mtry: The number of predictors that will be randomly sampled at each split when creating the tree models.
neighbors: a parameter for the number of neighbors used in a prototype model.
num_comp: the number of components in a model (e.g. PCA or PLS components).
num_terms: a nonspecific parameter for the number of terms in a model. This can be used with models that include feature selection, such as MARS.
prod_degree: the number of terms to combine into interactions. A value of 1 implies an additive model. Useful for MARS models and some linear models.
prune: a logical for whether a tree or set of rules should be pruned.
rbf_sigma: the sigma parameters of a radial basis function.
penalty: The amount of regularization used. In cases where different penalty types require to be differentiated, the names L1 and L2 are recommended.
sample_size: the size of the data set used for modeling within an iteration of the modeling algorithm, such as stochastic gradient boosting.
surv_dist: the statistical distribution of the data in a survival analysis model.
tree_depth: The maximum depth of the tree (i.e. number of splits).
trees: The number of trees contained in a random forest or boosted ensemble. In the latter case, this is equal to the number of boosting iterations.
weight_func: The type of kernel function that weights the distances between samples (e.g. in a K-nearest neighbors model).

7.6 Others

fn and fns when a single or multiple functions are passed as arguments (respectively).