Become an expert in R — Interactive courses, Cheat Sheets, certificates and more!
Get Started for Free

C5imp

Variable Importance Measures for C5.0 Models


Description

This function calculates the variable importance (aka attribute usage) for C5.0 models.

Usage

C5imp(object, metric = "usage", pct = TRUE, ...)

Arguments

object

an object of class C5.0

metric

either 'usage' or 'splits' (see Details below)

pct

a logical: should the importance values be converted to be between 0 and 100?

...

other options (not currently used)

Details

By default, C5.0 measures predictor importance by determining the percentage of training set samples that fall into all the terminal nodes after the split (this is used when metric = "usage"). For example, the predictor in the first split automatically has an importance measurement of 100 percent. Other predictors may be used frequently in splits, but if the terminal nodes cover only a handful of training set samples, the importance scores may be close to zero. The same strategy is applied to rule-based models as well as the corresponding boosted versions of the model.

There is a difference in the attribute usage numbers between this output and the nominal command line output. Although the calculations are almost exactly the same (we do not add 1/2 to everything), the C code does not display that an attribute was used if the percentage of training samples covered by the corresponding splits is very low. Here, the threshold was lowered and the fractional usage is shown.

When metric = "splits", the percentage of splits associated with each predictor is calculated.

Value

a data frame with a column Overall with the predictor usage values. The row names indicate the predictor.

Author(s)

Original GPL C code by Ross Quinlan, R code and modifications to C by Max Kuhn, Steve Weston and Nathan Coulter

References

Quinlan R (1993). C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers, http://www.rulequest.com/see5-unix.html

See Also

Examples

library(modeldata)
data(mlc_churn)

treeModel <- C5.0(x = mlc_churn[1:3333, -20], y = mlc_churn$churn[1:3333])
C5imp(treeModel)
C5imp(treeModel, metric = "splits")

C50

C5.0 Decision Trees and Rule-Based Models

v0.1.3.1
GPL-3
Authors
Max Kuhn [aut, cre], Steve Weston [ctb], Mark Culp [ctb], Nathan Coulter [ctb], Ross Quinlan [aut] (Author of imported C code), RuleQuest Research [cph] (Copyright holder of imported C code), Rulequest Research Pty Ltd. [cph] (Copyright holder of imported C code)
Initial release

We don't support your browser anymore

Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.