In recipes, there are no constraints related to the order in which steps are added to the recipe. However, there are some general suggestions that you should consider:
- If using a Box-Cox transformation, don’t center the data first or do any operations that might make the data non-positive. Alternatively, use the Yeo-Johnson transformation so you don’t have to worry about this.
- Recipes do not automatically create dummy variables (unlike most formula methods). If you want to center, scale, or do any other operations on all of the predictors, run
step_dummy first so that numeric columns are in the data set instead of factors.
- As noted in the help file for
step_interact, you should make dummy variables before creating the interactions.
- If you are lumping infrequently categories together with
While your project’s needs may vary, here is a suggested order of potential steps that should work for most problems:
- Individual transformations for skewness and other issues
- Discretize (if needed and if you have no other choice)
- Create dummy variables
- Create interactions
- Normalization steps (center, scale, range, etc)
- Multivariate transformation (e.g. PCA, spatial sign, etc)
Again, your milage may vary for your particular problem.