Become an expert in R — Interactive courses, Cheat Sheets, certificates and more!
Get Started for Free

superpc.cv

Cross-validation for supervised principal components


Description

This function uses a form of cross-validation to estimate the optimal feature threshold in supervised principal components

Usage

superpc.cv(fit,
               data, 
               n.threshold=20,
               n.fold=NULL,
               folds=NULL,
               n.components=3, 
               min.features=5, 
               max.features=nrow(data$x),
               compute.fullcv= TRUE,
               compute.preval=TRUE, 
               xl.mode=c("regular","firsttime","onetime","lasttime"), 
               xl.time=NULL,
               xl.prevfit=NULL)

Arguments

fit

Object returned by superpc.train

data

Data object of form described in superpc.train documentation

n.threshold

Number of thresholds to consider. Default 20.

n.fold

Number of cross-validation folds. default is around 10 (program pick a convenient value based on the sample size

folds

List of indices of cross-validation folds (optional)

n.components

Number of cross-validation components to use: 1,2 or 3.

min.features

Minimum number of features to include in determining range for threshold. Default 5.

max.features

Maximum number of features to include in determining range for threshold. Default is total number of features in the dataset

compute.fullcv

Should full cross-validation be done?

compute.preval

Should full pre-validation be done?

xl.mode

Used by Excel interface only

xl.time

Used by Excel interface only

xl.prevfit

Used by Excel interface only

Details

This function uses a form of cross-validation to estimate the optimal feature threshold in supervised principal components. To avoid prolems with fitting Cox models to samll validation datastes, it uses the "pre-validation" approach of Tibshirani and Efron (2002)

Value

threshold

Vector of thresholds considered

nonzero

Number of features exceeding each value of the threshold

scor.preval

Likelihood ratio scores from pre-validation

scor

Full CV scores

folds

Indices of CV folds used

featurescores.folds

Feature scores for each fold

v.preval

The pre-validated predictors

type

problem type

call

calling sequence

Author(s)

  • "Eric Bair, Ph.D."

  • "Jean-Eudes Dazard, Ph.D."

  • "Rob Tibshirani, Ph.D."

Maintainer: "Jean-Eudes Dazard, Ph.D."

References

  • E. Bair and R. Tibshirani (2004). "Semi-supervised methods to predict patient survival from gene expression data." PLoS Biol, 2(4):e108.

  • E. Bair, T. Hastie, D. Paul, and R. Tibshirani (2006). "Prediction by supervised principal components." J. Am. Stat. Assoc., 101(473):119-137.

Examples

## Not run: 
set.seed(332)

#generate some data
x <- matrix(rnorm(50*30), ncol=30)
y <- 10 + svd(x[1:50,])$v[,1] + .1*rnorm(30)
censoring.status <- sample(c(rep(1,20), rep(0,10)))

featurenames <- paste("feature", as.character(1:50), sep="")
data <- list(x=x, 
             y=y, 
             censoring.status=censoring.status, 
             featurenames=featurenames)

a <- superpc.train(data, type="survival")
aa <- superpc.cv(a, data)

## End(Not run)

superpc

Supervised Principal Components

v1.12
GPL (>= 3) | file LICENSE
Authors
Eric Bair [aut], Jean-Eudes Dazard [cre, ctb], Rob Tibshirani [ctb]
Initial release
2020-10-19

We don't support your browser anymore

Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.