Become an expert in R — Interactive courses, Cheat Sheets, certificates and more!
Get Started for Free

zprostate

Prostate cancer data. Standardized.


Description

Data with 8 inputs and one output used to illustrate the prediction problem and regression in the textbook of Hastie, Tibshirani and Freedman (2009).

Usage

data(zprostate)

Format

A data frame with 97 observations, 9 inputs and 1 output. All input variables have been standardized.

lcavol

log-cancer volume

lweight

log prostate weight

age

age in years

lbph

log benign prostatic hyperplasia

svi

seminal vesicle invasion

lcp

log of capsular penetration

gleason

Gleason score

pgg45

percent of Gleascores 4/5

lpsa

Outcome. Log of PSA

train

TRUE or FALSE

Details

A study of 97 men with prostate cancer examined the correlation between PSA (prostate specific antigen) and a number of clinical measurements: lcavol, lweight, lbph, svi, lcp, gleason, pgg45

References

Hastie, Tibshirani & Friedman. (2009). The Elements of Statistical Learning: Data Mining, Inference, and Prediction. 2nd Ed. Springer.

Examples

#Prostate data. Table 3.3 HTF.
data(zprostate)
#full dataset
trainQ<-zprostate[,10]
train <-zprostate[trainQ,-10]
test <-zprostate[!trainQ,-10]
ans<-lm(lpsa~., data=train)
sig<-summary(ans)$sigma
yHat<-predict(ans, newdata=test)
yTest<-zprostate$lpsa[!trainQ]
TE<-mean((yTest-yHat)^2)
#subset
ansSub<-bestglm(train, IC="BICq")$BestModel
sigSub<-summary(ansSub)$sigma
yHatSub<-predict(ansSub, newdata=test)
TESub<-mean((yTest-yHatSub)^2)
m<-matrix(c(TE,sig,TESub,sigSub), ncol=2)
dimnames(m)<-list(c("TestErr","Sd"),c("LS","Best"))
m

bestglm

Best Subset GLM and Regression Utilities

v0.37.3
GPL (>= 2)
Authors
A.I. McLeod, Changjiang Xu and Yuanhao Lai
Initial release
2020-03-13

We don't support your browser anymore

Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.