Become an expert in R — Interactive courses, Cheat Sheets, certificates and more!
Get Started for Free

downSample

Down- and Up-Sampling Imbalanced Data


Description

downSample will randomly sample a data set so that all classes have the same frequency as the minority class. upSample samples with replacement to make the class distributions equal

Usage

downSample(x, y, list = FALSE, yname = "Class")

Arguments

x

a matrix or data frame of predictor variables

y

a factor variable with the class memberships

list

should the function return list(x, y) or bind x and y together? If FALSE, the output will be coerced to a data frame.

yname

if list = FALSE, a label for the class column

Details

Simple random sampling is used to down-sample for the majority class(es). Note that the minority class data are left intact and that the samples will be re-ordered in the down-sampled version.

For up-sampling, all the original data are left intact and additional samples are added to the minority classes with replacement.

Value

Either a data frame or a list with elements x and y.

Author(s)

Max Kuhn

Examples

## A ridiculous example...
data(oil)
table(oilType)
downSample(fattyAcids, oilType)

upSample(fattyAcids, oilType)

caret

Classification and Regression Training

v6.0-86
GPL (>= 2)
Authors
Max Kuhn [aut, cre], Jed Wing [ctb], Steve Weston [ctb], Andre Williams [ctb], Chris Keefer [ctb], Allan Engelhardt [ctb], Tony Cooper [ctb], Zachary Mayer [ctb], Brenton Kenkel [ctb], R Core Team [ctb], Michael Benesty [ctb], Reynald Lescarbeau [ctb], Andrew Ziem [ctb], Luca Scrucca [ctb], Yuan Tang [ctb], Can Candan [ctb], Tyler Hunt [ctb]
Initial release

We don't support your browser anymore

Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.