microeco: trans_classifier – R documentation

Become an expert in R — Interactive courses, Cheat Sheets, certificates and more!

trans_classifier

Create trans_classifier object for machine-learning-based model prediction.

Description

This class is a wrapper for methods of machine-learning-based classification models, including data pre-processing, feature selection, data split, model training, prediction, confusionMatrix and ROC (Receiver Operator Characteristic) or PR (Precision-Recall) curve.

Author(s): Felipe Mansoldo and Chi Liu

Methods

Public methods

trans_classifier$new()
trans_classifier$cal_preProcess()
trans_classifier$cal_feature_sel()
trans_classifier$cal_split()
trans_classifier$set_trainControl()
trans_classifier$cal_train()
trans_classifier$cal_feature_imp()
trans_classifier$cal_predict()
trans_classifier$plot_confusionMatrix()
trans_classifier$cal_ROC()
trans_classifier$plot_ROC()
trans_classifier$clone()

Method `new()`

Create the trans_classifier object.

Usage

trans_classifier$new(
  dataset = NULL,
  x.predictors = "all",
  y.response = NULL,
  n.cores = 1
)

Arguments

dataset

the object of microtable Class.

x.predictors

default "all"; character string or data.frame; a character string represents selecting the corresponding data from microtable$taxa_abund; data.frame represents other customized data. See the following available options:

'all': use all the taxa stored in microtable$taxa_abund
'Genus': use Genus level table in microtable$taxa_abund, or other specific taxonomic rank, e.g. 'Phylum'
other input: must be a data.frame; It should have the same format with the data.frame in microtable$taxa_abund, i.e. rows are features; cols are samples with same names in sample_table

y.response

default NULL; the response variable in sample_table.

n.cores

default 1; the CPU thread used.

Returns

data_feature and data_response in the object.

Examples

\donttest{
data(dataset)
t1 <- trans_classifier$new(
		dataset = dataset, 
		x.predictors = "Genus",
		y.response = "Group")
}

Method `cal_preProcess()`

Pre-process (centering, scaling etc.) of the feature data based on the caret::preProcess function. See https://topepo.github.io/caret/pre-processing.html for more details.

Usage

trans_classifier$cal_preProcess(...)

Arguments

...: parameters pass to preProcess function of caret package.

Returns

converted data_feature in the object.

Examples

\dontrun{
t1$cal_preProcess(method = c("center", "scale", "nzv"))
}

Method `cal_feature_sel()`

Perform feature selection. See https://topepo.github.io/caret/feature-selection-overview.html for more details.

Usage

trans_classifier$cal_feature_sel(
  boruta.maxRuns = 300,
  boruta.pValue = 0.01,
  boruta.repetitions = 4,
  ...
)

Arguments

boruta.maxRuns: default 300; maximal number of importance source runs; passed to the maxRuns parameter in Boruta function of Boruta package.
boruta.pValue: default 0.01; p value passed to the pValue parameter in Boruta function of Boruta package.
boruta.repetitions: default 4; repetition runs for the feature selection.
...: parameters pass to Boruta function of Boruta package.

Returns

optimized data_feature in the object.

Examples

\donttest{
t1$cal_feature_sel(boruta.maxRuns = 300, boruta.pValue = 0.01)
}

Method `cal_split()`

Split data for training and testing.

Usage

trans_classifier$cal_split(prop.train = 3/4)

Arguments

prop.train: default 3/4; the ratio of the dataset used for the training.

Returns

data_train and data_test in the object.

Examples

\donttest{
t1$cal_split(prop.train = 3/4)
}

Method `set_trainControl()`

Control parameters for the following training. See trainControl function of caret package for details.

Usage

trans_classifier$set_trainControl(
  method = "repeatedcv",
  classProbs = TRUE,
  savePredictions = TRUE,
  ...
)

Arguments

method: default 'repeatedcv'; 'repeatedcv': Repeated k-Fold cross validation; see method parameter in trainControl function of caret package for available options.
classProbs: default TRUE; should class probabilities be computed for classification models?; see classProbs parameter in caret::trainControl function.
savePredictions: default TRUE; see savePredictions parameter in caret::trainControl function
...: parameters pass to trainControl function of caret package.

Returns

trainControl in the object.

Examples

\dontrun{
t1$set_trainControl(method = 'repeatedcv')
}

Method `cal_train()`

Run the model training.

Usage

trans_classifier$cal_train(
  method = "rf",
  metric = "Accuracy",
  max.mtry = 2,
  max.ntree = 200,
  ...
)

Arguments

method: default "rf"; "rf": random forest; see method in caret::train function for other options.
metric: default "Accuracy"; see metric in caret::train function for other options.
max.mtry: default 2; for method = "rf"; maximum mtry used for the tunegrid to do hyperparameter tuning to optimize the model.
max.ntree: default 200; for method = "rf"; maximum number of trees used to optimize the model.
...: parameters pass to train function of caret package.

Returns

res_train in the object.

Examples

\dontrun{
# random forest
t1$cal_train(method = "rf")
# Support Vector Machines with Radial Basis Function Kernel
t1$cal_train(method = "svmRadial", tuneLength = 15)
}

Method `cal_feature_imp()`

Get feature importance from the training model.

Usage

trans_classifier$cal_feature_imp(...)

Arguments

...: parameters pass to varImp function of caret package.

Returns

res_feature_imp in the object. One row for each predictor variable. The column(s) are different importance measures.

Examples

\dontrun{
t1$cal_feature_imp()
}

Method `cal_predict()`

Run the prediction.

Usage

trans_classifier$cal_predict(positive_class = NULL)

Arguments

positive_class: default NULL; see positive parameter in confusionMatrix function of caret package; If positive_class is NULL, use the first group in data as the positive class automatically.

Returns

res_predict, res_confusion_fit and res_confusion_stats stored in the object.

Examples

\dontrun{
t1$cal_predict()
}

Method `plot_confusionMatrix()`

Plot the cross-tabulation of observed and predicted classes with associated statistics.

Usage

trans_classifier$plot_confusionMatrix(
  plot_confusion = TRUE,
  plot_statistics = TRUE
)

Arguments

plot_confusion: default TRUE; whether plot the confusion matrix.
plot_statistics: default TRUE; whether plot the statistics.

Returns

ggplot object.

Examples

\dontrun{
t1$plot_confusionMatrix()
}

Method `cal_ROC()`

Get ROC (Receiver Operator Characteristic) curve data and the performance data.

Usage

trans_classifier$cal_ROC(input = "pred")

Arguments

input: default "pred"; 'pred' or 'train'; 'pred' represents using prediction results; 'train' represents using training results.

Returns

a list res_ROC stored in the object.

Examples

\dontrun{
t1$cal_ROC()
}

Method `plot_ROC()`

Plot ROC curve.

Usage

trans_classifier$plot_ROC(
  plot_type = c("ROC", "PR")[1],
  plot_group = "all",
  color_values = RColorBrewer::brewer.pal(8, "Dark2"),
  add_AUC = TRUE,
  plot_method = FALSE,
  ...
)

Arguments

plot_type: default c("ROC", "PR")[1]; 'ROC' represents ROC (Receiver Operator Characteristic) curve; 'PR' represents PR (Precision-Recall) curve.
plot_group: default "all"; 'all' represents all the classes in the model; 'add' represents all adding micro-average and macro-average results, see https://scikit-learn.org/stable/auto_examples/model_selection/plot_roc.html; other options should be one or more class names, same with the names in Group column of res_ROC$res_roc from cal_ROC function.
color_values: default RColorBrewer::brewer.pal(8, "Dark2"); colors used in the plot.
add_AUC: default TRUE; whether add AUC in the legend.
plot_method: default FALSE; If TRUE, show the method in the legend though only one method is found.
...: parameters pass to geom_path function of ggplot2 package.

Returns

ggplot2 object.

Examples

\dontrun{
t1$plot_ROC(size = 1, alpha = 0.7)
}

Method `clone()`

The objects of this class are cloneable with this method.

Usage

trans_classifier$clone(deep = FALSE)

Arguments

deep: Whether to make a deep clone.

Examples

## ------------------------------------------------
## Method `trans_classifier$new`
## ------------------------------------------------


data(dataset)
t1 <- trans_classifier$new(
		dataset = dataset, 
		x.predictors = "Genus",
		y.response = "Group")


## ------------------------------------------------
## Method `trans_classifier$cal_preProcess`
## ------------------------------------------------

## Not run: 
t1$cal_preProcess(method = c("center", "scale", "nzv"))

## End(Not run)

## ------------------------------------------------
## Method `trans_classifier$cal_feature_sel`
## ------------------------------------------------


t1$cal_feature_sel(boruta.maxRuns = 300, boruta.pValue = 0.01)


## ------------------------------------------------
## Method `trans_classifier$cal_split`
## ------------------------------------------------


t1$cal_split(prop.train = 3/4)


## ------------------------------------------------
## Method `trans_classifier$set_trainControl`
## ------------------------------------------------

## Not run: 
t1$set_trainControl(method = 'repeatedcv')

## End(Not run)

## ------------------------------------------------
## Method `trans_classifier$cal_train`
## ------------------------------------------------

## Not run: 
# random forest
t1$cal_train(method = "rf")
# Support Vector Machines with Radial Basis Function Kernel
t1$cal_train(method = "svmRadial", tuneLength = 15)

## End(Not run)

## ------------------------------------------------
## Method `trans_classifier$cal_feature_imp`
## ------------------------------------------------

## Not run: 
t1$cal_feature_imp()

## End(Not run)

## ------------------------------------------------
## Method `trans_classifier$cal_predict`
## ------------------------------------------------

## Not run: 
t1$cal_predict()

## End(Not run)

## ------------------------------------------------
## Method `trans_classifier$plot_confusionMatrix`
## ------------------------------------------------

## Not run: 
t1$plot_confusionMatrix()

## End(Not run)

## ------------------------------------------------
## Method `trans_classifier$cal_ROC`
## ------------------------------------------------

## Not run: 
t1$cal_ROC()

## End(Not run)

## ------------------------------------------------
## Method `trans_classifier$plot_ROC`
## ------------------------------------------------

## Not run: 
t1$plot_ROC(size = 1, alpha = 0.7)

## End(Not run)

microeco

Microbial Community Ecology Data Analysis

v0.10.0

GPL-3

Authors

Chi Liu [aut, cre], Felipe R. P. Mansoldo [ctb], Umer Zeeshan Ijaz [ctb], Chenhao Li [ctb], Yang Cao [ctb], Minjie Yao [ctb], Xiangzhen Li [ctb]

Initial release

trans_classifier

Description

Methods

Public methods

Method new()

Usage

Arguments

Returns

Examples

Method cal_preProcess()

Usage

Arguments

Returns

Examples

Method cal_feature_sel()

Usage

Arguments

Returns

Examples

Method cal_split()

Usage

Arguments

Returns

Examples

Method set_trainControl()

Usage

Arguments

Returns

Examples

Method cal_train()

Usage

Arguments

Returns

Examples

Method cal_feature_imp()

Usage

Arguments

Returns

Examples

Method cal_predict()

Usage

Arguments

Returns

Examples

Method plot_confusionMatrix()

Usage

Arguments

Returns

Examples

Method cal_ROC()

Usage

Arguments

Returns

Examples

Method plot_ROC()

Usage

Arguments

Returns

Examples

Method clone()

Usage

Arguments

Examples

microeco

We don't support your browser anymore

Method `new()`

Method `cal_preProcess()`

Method `cal_feature_sel()`

Method `cal_split()`

Method `set_trainControl()`

Method `cal_train()`

Method `cal_feature_imp()`

Method `cal_predict()`

Method `plot_confusionMatrix()`

Method `cal_ROC()`

Method `plot_ROC()`

Method `clone()`