Weights of evidence
Computes weight of evidence transform of factor variables for binary classification.
woe(x, ...) ## Default S3 method: woe(x, grouping, weights = NULL, zeroadj = 0, ids = NULL, appont = TRUE, ...) ## S3 method for class 'formula' woe(formula, data = NULL, weights = NULL, ...)
x |
A matrix or data frame containing the explanatory variables. |
grouping |
A factor specifying the binary class for each observation. |
formula |
A formula of the form |
data |
Data frame from which variables specified in formula are to be taken. |
weights |
Vector with observation weights. For call |
zeroadj |
Additive constant to be added for a level with 0 observations in a class. |
ids |
Vector of either indices or variable names that specifies the variables to be transformed. |
appont |
Application on training data: logical indicating whether the transformed values for the training data should be returned by recursive calling of |
... |
For |
To each factor level x a numeric value WOE(x) = ln(f(x|1)/f(x|2)) is assigned where 1 and 2 denote the class labels. The WOE transform is motivated for subsequent modelling by logistic regression. Note that the frequencies of the classes should be investigated before. Information values heuristically quantify the discriminatory power of a variable by IV = (f(x|1)-f(x|2)) ln(f(x|1)/f(x|2)).
Returns an object of class woe that can be applied to new data.
woe |
WOE coefficients for factor2numeric transformation of each (specified) variable. |
IV |
Vector of information values of all transformed variables. |
newx |
Data frame of transformed data if |
Gero Szepannek
Good, I. (1950): Probability and the Weighting of Evidences. Charles Griffin, London.
Kullback, S. (1959): Information Theory and Statistics. Wiley, New York.
## load German credit data data("GermanCredit") ## training/validation split train <- sample(nrow(GermanCredit), round(0.6*nrow(GermanCredit))) woemodel <- woe(credit_risk~., data = GermanCredit[train,], zeroadj=0.5, applyontrain = TRUE) woemodel ## plot variable information values and woes plot(woemodel) plot(woemodel, type = "woes") ## apply woes traindata <- predict(woemodel, GermanCredit[train,], replace = TRUE) str(traindata) ## fit logistic regression model glmodel <- glm(credit_risk~., traindata, family=binomial) summary(glmodel) pred.trn <- predict(glmodel, traindata, type = "response") ## predict validation data validata <- predict(woemodel, GermanCredit[-train,], replace = TRUE) pred.val <- predict(glmodel, validata, type = "response")
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.