Prostate cancer data. Standardized.
Data with 8 inputs and one output used to illustrate the prediction problem and regression in the textbook of Hastie, Tibshirani and Freedman (2009).
data(zprostate)
A data frame with 97 observations, 9 inputs and 1 output. All input variables have been standardized.
lcavol
log-cancer volume
lweight
log prostate weight
age
age in years
lbph
log benign prostatic hyperplasia
svi
seminal vesicle invasion
lcp
log of capsular penetration
gleason
Gleason score
pgg45
percent of Gleascores 4/5
lpsa
Outcome. Log of PSA
train
TRUE or FALSE
A study of 97 men with prostate cancer examined the correlation between PSA (prostate specific antigen) and a number of clinical measurements: lcavol, lweight, lbph, svi, lcp, gleason, pgg45
Hastie, Tibshirani & Friedman. (2009). The Elements of Statistical Learning: Data Mining, Inference, and Prediction. 2nd Ed. Springer.
#Prostate data. Table 3.3 HTF. data(zprostate) #full dataset trainQ<-zprostate[,10] train <-zprostate[trainQ,-10] test <-zprostate[!trainQ,-10] ans<-lm(lpsa~., data=train) sig<-summary(ans)$sigma yHat<-predict(ans, newdata=test) yTest<-zprostate$lpsa[!trainQ] TE<-mean((yTest-yHat)^2) #subset ansSub<-bestglm(train, IC="BICq")$BestModel sigSub<-summary(ansSub)$sigma yHatSub<-predict(ansSub, newdata=test) TESub<-mean((yTest-yHatSub)^2) m<-matrix(c(TE,sig,TESub,sigSub), ncol=2) dimnames(m)<-list(c("TestErr","Sd"),c("LS","Best")) m
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.