Recombine a CV.SuperLearner fit using a new metalearning method
Function to re-compute the V-fold cross-validated risk estimate for super learner using a new metalearning method. This function takes as input an existing CV.SuperLearner fit and applies the recombineSL
fit to each of the V Super Learner fits.
recombineCVSL(object, method = "method.NNloglik", verbose = FALSE, saveAll = TRUE, parallel = "seq")
object |
Fitted object from |
method |
A list (or a function to create a list) containing details on estimating the coefficients for the super learner and the model to combine the individual algorithms in the library. See |
verbose |
logical; TRUE for printing progress during the computation (helpful for debugging). |
saveAll |
Logical; Should the entire |
parallel |
Options for parallel computation of the V-fold step. Use "seq" (the default) for sequential computation. |
The function recombineCVSL
computes the usual V-fold cross-validated risk estimate for the super learner (and all algorithms in SL.library
for comparison), using a newly specified metalearning method. The weights for each algorithm in SL.library
are re-estimated using the new metalearner, however the base learner fits are not regenerated, so this function saves a lot of computation time as opposed to using the CV.SuperLearner
function with a new method
argument. The output is identical to the output from the CV.SuperLearner
function.
An object of class CV.SuperLearner
(a list) with components:
call |
The matched call. |
AllSL |
If |
SL.predict |
The predicted values from the super learner when each particular row was part of the validation fold. |
discreteSL.predict |
The traditional cross-validated selector. Picks the algorithm with the smallest cross-validated risk (in super learner terms, gives that algorithm coefficient 1 and all others 0). |
whichDiscreteSL |
A list of length |
library.predict |
A matrix with the predicted values from each algorithm in |
coef |
A matrix with the coefficients for the super learner on each fold. The columns are the algorithms in |
folds |
A list containing the row numbers for each validation fold. |
V |
Number of folds for |
libraryNames |
A character vector with the names of the algorithms in the library. The format is 'predictionAlgorithm_screeningAlgorithm' with '_All' used to denote the prediction algorithm run on all variables in X. |
SL.library |
Returns |
method |
A list with the method functions. |
Y |
The outcome |
Erin LeDell ledell@berkeley.edu
## Not run: # Binary outcome example adapted from SuperLearner examples set.seed(1) N <- 200 X <- matrix(rnorm(N*10), N, 10) X <- as.data.frame(X) Y <- rbinom(N, 1, plogis(.2*X[, 1] + .1*X[, 2] - .2*X[, 3] + .1*X[, 3]*X[, 4] - .2*abs(X[, 4]))) SL.library <- c("SL.glmnet", "SL.glm", "SL.knn", "SL.mean") # least squares loss function set.seed(1) # for reproducibility cvfit_nnls <- CV.SuperLearner(Y = Y, X = X, V = 10, SL.library = SL.library, verbose = TRUE, method = "method.NNLS", family = binomial()) cvfit_nnls$coef # SL.glmnet_All SL.glm_All SL.knn_All SL.gam_All SL.mean_All # 1 0.0000000 0.00000000 0.000000000 0.4143862 0.5856138 # 2 0.0000000 0.00000000 0.304802397 0.3047478 0.3904498 # 3 0.0000000 0.00000000 0.002897533 0.5544075 0.4426950 # 4 0.0000000 0.20322642 0.000000000 0.1121891 0.6845845 # 5 0.1743973 0.00000000 0.032471026 0.3580624 0.4350693 # 6 0.0000000 0.00000000 0.099881535 0.3662309 0.5338876 # 7 0.0000000 0.00000000 0.234876082 0.2942472 0.4708767 # 8 0.0000000 0.06424676 0.113988158 0.5600208 0.2617443 # 9 0.0000000 0.00000000 0.338030342 0.2762604 0.3857093 # 10 0.3022442 0.00000000 0.294226204 0.1394534 0.2640762 # negative log binomial likelihood loss function cvfit_nnloglik <- recombineCVSL(cvfit_nnls, method = "method.NNloglik") cvfit_nnloglik$coef # SL.glmnet_All SL.glm_All SL.knn_All SL.gam_All SL.mean_All # 1 0.0000000 0.0000000 0.00000000 0.5974799 0.40252010 # 2 0.0000000 0.0000000 0.31177345 0.6882266 0.00000000 # 3 0.0000000 0.0000000 0.01377469 0.8544238 0.13180152 # 4 0.0000000 0.1644188 0.00000000 0.2387919 0.59678930 # 5 0.2142254 0.0000000 0.00000000 0.3729426 0.41283197 # 6 0.0000000 0.0000000 0.00000000 0.5847150 0.41528502 # 7 0.0000000 0.0000000 0.47538172 0.5080311 0.01658722 # 8 0.0000000 0.0000000 0.00000000 1.0000000 0.00000000 # 9 0.0000000 0.0000000 0.45384961 0.2923480 0.25380243 # 10 0.3977816 0.0000000 0.27927906 0.1606384 0.16230097 # If we use the same seed as the original `cvfit_nnls`, then # the recombineCVSL and CV.SuperLearner results will be identical # however, the recombineCVSL version will be much faster since # it doesn't have to re-fit all the base learners, V times each. set.seed(1) cvfit_nnloglik2 <- CV.SuperLearner(Y = Y, X = X, V = 10, SL.library = SL.library, verbose = TRUE, method = "method.NNloglik", family = binomial()) cvfit_nnloglik2$coef # SL.glmnet_All SL.glm_All SL.knn_All SL.gam_All SL.mean_All # 1 0.0000000 0.0000000 0.00000000 0.5974799 0.40252010 # 2 0.0000000 0.0000000 0.31177345 0.6882266 0.00000000 # 3 0.0000000 0.0000000 0.01377469 0.8544238 0.13180152 # 4 0.0000000 0.1644188 0.00000000 0.2387919 0.59678930 # 5 0.2142254 0.0000000 0.00000000 0.3729426 0.41283197 # 6 0.0000000 0.0000000 0.00000000 0.5847150 0.41528502 # 7 0.0000000 0.0000000 0.47538172 0.5080311 0.01658722 # 8 0.0000000 0.0000000 0.00000000 1.0000000 0.00000000 # 9 0.0000000 0.0000000 0.45384961 0.2923480 0.25380243 # 10 0.3977816 0.0000000 0.27927906 0.1606384 0.16230097 ## End(Not run)
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.