Chapter 7 Standardized Argument Names

7.1 Dot Usage

  • If there is a possibility of argument name conflicts between the function and any arguments passed down through ..., it is strongly suggested that the argument names to the main function be prefixed with a dot (e.g. .data, .x, etc.)

  • When defining the order of arguments in a function, try to keep the ... as far to the left as possible to coerce users to explicitly name all arguments to the right of ....

7.2 Data Arguments

  • na_rm: missing data handling.

  • new_data: data to be predicted.

  • weights: case weights.

  • For .data.frame methods:

    • x: predictors or generic data objects.

    • y: outcome data.

  • For .formula methods:

    • formula: a y ~ x formula specifying the outcome and predictors.

    • data: the data.frame to pull formula variables from.

7.3 Numerical Arguments

  • times: the number of bootstraps, simulations, or other replications.

7.4 Statistical Quantities

  • direction: the type of hypothesis test alternative.

  • level: interval levels (e.g., confidence, credible, prediction, and so on).

  • link: link functions for generalized linear models.

7.5 Tuning Parameters

  • activation: the type of activation function between network layers.

  • cost: a cost value for SVM models.

  • Cp: The cost-complexity parameter in classical CART models.

  • deg_free: a parameter for the degrees of freedom.

  • degree: the polynomial degree.

  • dropout: the parameter dropout rate.

  • epochs: the number of iterations of training.

  • hidden_units: the number of hidden units in a network layer.

  • Laplace: the Laplace correction used to smooth low-frequency counts.

  • learn_rate: the rate at which the boosting algorithm adapts from iteration-to-iteration.

  • loss_reduction: The reduction in the loss function required to split further.

  • min_n: The minimum number of data points in a node that are required for the node to be split further.

  • mixture: the proportion of L1 regularization in the model.

  • mtry: The number of predictors that will be randomly sampled at each split when creating the tree models.

  • neighbors: a parameter for the number of neighbors used in a prototype model.

  • num_comp: the number of components in a model (e.g. PCA or PLS components).

  • num_terms: a nonspecific parameter for the number of terms in a model. This can be used with models that include feature selection, such as MARS.

  • prod_degree: the number of terms to combine into interactions. A value of 1 implies an additive model. Useful for MARS models and some linear models.

  • prune: a logical for whether a tree or set of rules should be pruned.

  • rbf_sigma: the sigma parameters of a radial basis function.

  • penalty: The amount of regularization used. In cases where different penalty types require to be differentiated, the names L1 and L2 are recommended.

  • sample_size: the size of the data set used for modeling within an iteration of the modeling algorithm, such as stochastic gradient boosting.

  • surv_dist: the statistical distribution of the data in a survival analysis model.

  • tree_depth: The maximum depth of the tree (i.e. number of splits).

  • trees: The number of trees contained in a random forest or boosted ensemble. In the latter case, this is equal to the number of boosting iterations.

  • weight_func: The type of kernel function that weights the distances between samples (e.g. in a K-nearest neighbors model).

7.6 Others

  • fn and fns when a single or multiple functions are passed as arguments (respectively).