Importance of features in a model.
Creates a data.table
of feature importances in a model.
xgb.importance( feature_names = NULL, model = NULL, trees = NULL, data = NULL, label = NULL, target = NULL )
feature_names |
character vector of feature names. If the model already
contains feature names, those would be used when |
model |
object of class |
trees |
(only for the gbtree booster) an integer vector of tree indices that should be included
into the importance calculation. If set to |
data |
deprecated. |
label |
deprecated. |
target |
deprecated. |
This function works for both linear and tree models.
For linear models, the importance is the absolute magnitude of linear coefficients. For that reason, in order to obtain a meaningful ranking by importance for a linear model, the features need to be on the same scale (which you also would want to do when using either L1 or L2 regularization).
For a tree model, a data.table
with the following columns:
Features
names of the features used in the model;
Gain
represents fractional contribution of each feature to the model based on
the total gain of this feature's splits. Higher percentage means a more important
predictive feature.
Cover
metric of the number of observation related to this feature;
Frequency
percentage representing the relative number of times
a feature have been used in trees.
A linear model's importance data.table
has the following columns:
Features
names of the features used in the model;
Weight
the linear coefficient of this feature;
Class
(only for multiclass models) class label.
If feature_names
is not provided and model
doesn't have feature_names
,
index of the features will be used instead. Because the index is extracted from the model dump
(based on C++ code), it starts at 0 (as in C/C++ or Python) instead of 1 (usual in R).
# binomial classification using gbtree: data(agaricus.train, package='xgboost') bst <- xgboost(data = agaricus.train$data, label = agaricus.train$label, max_depth = 2, eta = 1, nthread = 2, nrounds = 2, objective = "binary:logistic") xgb.importance(model = bst) # binomial classification using gblinear: bst <- xgboost(data = agaricus.train$data, label = agaricus.train$label, booster = "gblinear", eta = 0.3, nthread = 1, nrounds = 20, objective = "binary:logistic") xgb.importance(model = bst) # multiclass classification using gbtree: nclass <- 3 nrounds <- 10 mbst <- xgboost(data = as.matrix(iris[, -5]), label = as.numeric(iris$Species) - 1, max_depth = 3, eta = 0.2, nthread = 2, nrounds = nrounds, objective = "multi:softprob", num_class = nclass) # all classes clumped together: xgb.importance(model = mbst) # inspect importances separately for each class: xgb.importance(model = mbst, trees = seq(from=0, by=nclass, length.out=nrounds)) xgb.importance(model = mbst, trees = seq(from=1, by=nclass, length.out=nrounds)) xgb.importance(model = mbst, trees = seq(from=2, by=nclass, length.out=nrounds)) # multiclass classification using gblinear: mbst <- xgboost(data = scale(as.matrix(iris[, -5])), label = as.numeric(iris$Species) - 1, booster = "gblinear", eta = 0.2, nthread = 1, nrounds = 15, objective = "multi:softprob", num_class = nclass) xgb.importance(model = mbst)
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.