A bootstrap sample is a sample that is the same size as the original data set that is made using replacement. This results in analysis samples that have multiple replicates of some of the original rows of the data. The assessment set is defined as the rows of the original data that were not included in the bootstrap sample. This is often referred to as the "out-of-bag" (OOB) sample.

bootstraps(data, times = 25, strata = NULL, apparent = FALSE, ...)

Arguments

data

A data frame.

times

The number of bootstrap samples.

strata

A variable that is used to conduct stratified sampling. When not NULL, each bootstrap sample is created within the stratification variable.

apparent

A logical. Should an extra resample be added where the analysis and holdout subset are the entire data set. This is required for some estimators used by the summary function that require the apparent error rate.

...

Not currently used.

Value

An tibble with classes bootstraps, rset, tbl_df, tbl, and data.frame. The results include a column for the data split objects and a column called id that has a character string with the resample identifier.

Details

The argument apparent enables the option of an additional "resample" where the analysis and assessment data sets are the same as the original data set. This can be required for some types of analysis of the bootstrap results.

The strata argument is based on a similar argument in the random forest package were the bootstrap samples are conducted within the stratification variable. The can help ensure that the number of data points in the bootstrap sample is equivalent to the proportions in the original data set.

Examples

bootstraps(mtcars, times = 2)
#> # Bootstrap sampling #> # A tibble: 2 x 2 #> splits id #> <list> <chr> #> 1 <split [32/12]> Bootstrap1 #> 2 <split [32/13]> Bootstrap2
bootstraps(mtcars, times = 2, apparent = TRUE)
#> # Bootstrap sampling with apparent sample #> # A tibble: 3 x 2 #> splits id #> <list> <chr> #> 1 <split [32/10]> Bootstrap1 #> 2 <split [32/15]> Bootstrap2 #> 3 <split [32/32]> Apparent
library(purrr)
#> #> Attaching package: ‘purrr’
#> The following object is masked from ‘package:testthat’: #> #> is_null
iris2 <- iris[1:130, ] set.seed(13) resample1 <- bootstraps(iris2, times = 3) map_dbl(resample1$splits, function(x) { dat <- as.data.frame(x)$Species mean(dat == "virginica") })
#> [1] 0.2000000 0.2461538 0.2230769
set.seed(13) resample2 <- bootstraps(iris2, strata = "Species", times = 3) map_dbl(resample2$splits, function(x) { dat <- as.data.frame(x)$Species mean(dat == "virginica") })
#> 1 2 3 #> 0.2307692 0.2307692 0.2307692