Case-specific random forests.
In case-specific random forests (CSRF), random forests are built specific to the cases of interest. Instead of using equal probabilities, the cases are weighted according to their difference to the case of interest.
csrf(formula, training_data, test_data, params1 = list(), params2 = list())
formula |
Object of class |
training_data |
Training data of class |
test_data |
Test data of class |
params1 |
Parameters for the proximity random forest grown in the first step. |
params2 |
Parameters for the prediction random forests grown in the second step. |
The algorithm consists of 3 steps:
Grow a random forest on the training data
For each observation of interest (test data), the weights of all training observations are computed by counting the number of trees in which both observations are in the same terminal node.
For each test observation, grow a weighted random forest on the training data, using the weights obtained in step 2. Predict the outcome of the test observation as usual.
In total, n+1 random forests are grown, where n is the number observations in the test dataset. For details, see Xu et al. (2014).
Predictions for the test dataset.
Marvin N. Wright
Xu, R., Nettleton, D. & Nordman, D.J. (2014). Case-specific random forests. J Comp Graph Stat 25:49-65. https://doi.org/10.1080/10618600.2014.983641.
## Split in training and test data train.idx <- sample(nrow(iris), 2/3 * nrow(iris)) iris.train <- iris[train.idx, ] iris.test <- iris[-train.idx, ] ## Run case-specific RF csrf(Species ~ ., training_data = iris.train, test_data = iris.test, params1 = list(num.trees = 50, mtry = 4), params2 = list(num.trees = 5))
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.