yardstick is a package to estimate how well models are working using tidy data principles. See the package webpage for more information.


To install the package:

Two class metric

For example, suppose you create a classification model and predict on a new data set. You might have data that looks like this:

You can use a dplyr-like syntax to compute common performance characteristics of the model and get them back in a data frame:

Calculating metrics on resamples

If you have multiple resamples of a model, you can use a metric on a grouped data frame to calculate the metric across all resamples at once.

This calculates multiclass ROC AUC using the method described in Hand, Till (2001), and does it across all 10 resamples at once.

Autoplot methods for easy visualization

Curve based methods such as roc_curve(), pr_curve() and gain_curve() all have ggplot2::autoplot() methods that allow for powerful and easy visualization.


hpc_cv %>%
  group_by(Resample) %>%
  roc_curve(obs, VF:L) %>%