Cross-validation for supervised principal components
This function uses a form of cross-validation to estimate the optimal feature threshold in supervised principal components
superpc.cv(fit, data, n.threshold=20, n.fold=NULL, folds=NULL, n.components=3, min.features=5, max.features=nrow(data$x), compute.fullcv= TRUE, compute.preval=TRUE, xl.mode=c("regular","firsttime","onetime","lasttime"), xl.time=NULL, xl.prevfit=NULL)
fit |
Object returned by superpc.train |
data |
Data object of form described in superpc.train documentation |
n.threshold |
Number of thresholds to consider. Default 20. |
n.fold |
Number of cross-validation folds. default is around 10 (program pick a convenient value based on the sample size |
folds |
List of indices of cross-validation folds (optional) |
n.components |
Number of cross-validation components to use: 1,2 or 3. |
min.features |
Minimum number of features to include in determining range for threshold. Default 5. |
max.features |
Maximum number of features to include in determining range for threshold. Default is total number of features in the dataset |
compute.fullcv |
Should full cross-validation be done? |
compute.preval |
Should full pre-validation be done? |
xl.mode |
Used by Excel interface only |
xl.time |
Used by Excel interface only |
xl.prevfit |
Used by Excel interface only |
This function uses a form of cross-validation to estimate the optimal feature threshold in supervised principal components. To avoid prolems with fitting Cox models to samll validation datastes, it uses the "pre-validation" approach of Tibshirani and Efron (2002)
threshold |
Vector of thresholds considered |
nonzero |
Number of features exceeding each value of the threshold |
scor.preval |
Likelihood ratio scores from pre-validation |
scor |
Full CV scores |
folds |
Indices of CV folds used |
featurescores.folds |
Feature scores for each fold |
v.preval |
The pre-validated predictors |
type |
problem type |
call |
calling sequence |
"Eric Bair, Ph.D."
"Jean-Eudes Dazard, Ph.D."
"Rob Tibshirani, Ph.D."
Maintainer: "Jean-Eudes Dazard, Ph.D."
E. Bair and R. Tibshirani (2004). "Semi-supervised methods to predict patient survival from gene expression data." PLoS Biol, 2(4):e108.
E. Bair, T. Hastie, D. Paul, and R. Tibshirani (2006). "Prediction by supervised principal components." J. Am. Stat. Assoc., 101(473):119-137.
## Not run: set.seed(332) #generate some data x <- matrix(rnorm(50*30), ncol=30) y <- 10 + svd(x[1:50,])$v[,1] + .1*rnorm(30) censoring.status <- sample(c(rep(1,20), rep(0,10))) featurenames <- paste("feature", as.character(1:50), sep="") data <- list(x=x, y=y, censoring.status=censoring.status, featurenames=featurenames) a <- superpc.train(data, type="survival") aa <- superpc.cv(a, data) ## End(Not run)
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.