Become an expert in R — Interactive courses, Cheat Sheets, certificates and more!
Get Started for Free

trans_classifier

Create trans_classifier object for machine-learning-based model prediction.


Description

This class is a wrapper for methods of machine-learning-based classification models, including data pre-processing, feature selection, data split, model training, prediction, confusionMatrix and ROC (Receiver Operator Characteristic) or PR (Precision-Recall) curve.

Author(s): Felipe Mansoldo and Chi Liu

Methods

Public methods


Method new()

Create the trans_classifier object.

Usage
trans_classifier$new(
  dataset = NULL,
  x.predictors = "all",
  y.response = NULL,
  n.cores = 1
)
Arguments
dataset

the object of microtable Class.

x.predictors

default "all"; character string or data.frame; a character string represents selecting the corresponding data from microtable$taxa_abund; data.frame represents other customized data. See the following available options:

'all'

use all the taxa stored in microtable$taxa_abund

'Genus'

use Genus level table in microtable$taxa_abund, or other specific taxonomic rank, e.g. 'Phylum'

other input

must be a data.frame; It should have the same format with the data.frame in microtable$taxa_abund, i.e. rows are features; cols are samples with same names in sample_table

y.response

default NULL; the response variable in sample_table.

n.cores

default 1; the CPU thread used.

Returns

data_feature and data_response in the object.

Examples
\donttest{
data(dataset)
t1 <- trans_classifier$new(
		dataset = dataset, 
		x.predictors = "Genus",
		y.response = "Group")
}

Method cal_preProcess()

Pre-process (centering, scaling etc.) of the feature data based on the caret::preProcess function. See https://topepo.github.io/caret/pre-processing.html for more details.

Usage
trans_classifier$cal_preProcess(...)
Arguments
...

parameters pass to preProcess function of caret package.

Returns

converted data_feature in the object.

Examples
\dontrun{
t1$cal_preProcess(method = c("center", "scale", "nzv"))
}

Method cal_feature_sel()

Perform feature selection. See https://topepo.github.io/caret/feature-selection-overview.html for more details.

Usage
trans_classifier$cal_feature_sel(
  boruta.maxRuns = 300,
  boruta.pValue = 0.01,
  boruta.repetitions = 4,
  ...
)
Arguments
boruta.maxRuns

default 300; maximal number of importance source runs; passed to the maxRuns parameter in Boruta function of Boruta package.

boruta.pValue

default 0.01; p value passed to the pValue parameter in Boruta function of Boruta package.

boruta.repetitions

default 4; repetition runs for the feature selection.

...

parameters pass to Boruta function of Boruta package.

Returns

optimized data_feature in the object.

Examples
\donttest{
t1$cal_feature_sel(boruta.maxRuns = 300, boruta.pValue = 0.01)
}

Method cal_split()

Split data for training and testing.

Usage
trans_classifier$cal_split(prop.train = 3/4)
Arguments
prop.train

default 3/4; the ratio of the dataset used for the training.

Returns

data_train and data_test in the object.

Examples
\donttest{
t1$cal_split(prop.train = 3/4)
}

Method set_trainControl()

Control parameters for the following training. See trainControl function of caret package for details.

Usage
trans_classifier$set_trainControl(
  method = "repeatedcv",
  classProbs = TRUE,
  savePredictions = TRUE,
  ...
)
Arguments
method

default 'repeatedcv'; 'repeatedcv': Repeated k-Fold cross validation; see method parameter in trainControl function of caret package for available options.

classProbs

default TRUE; should class probabilities be computed for classification models?; see classProbs parameter in caret::trainControl function.

savePredictions

default TRUE; see savePredictions parameter in caret::trainControl function

...

parameters pass to trainControl function of caret package.

Returns

trainControl in the object.

Examples
\dontrun{
t1$set_trainControl(method = 'repeatedcv')
}

Method cal_train()

Run the model training.

Usage
trans_classifier$cal_train(
  method = "rf",
  metric = "Accuracy",
  max.mtry = 2,
  max.ntree = 200,
  ...
)
Arguments
method

default "rf"; "rf": random forest; see method in caret::train function for other options.

metric

default "Accuracy"; see metric in caret::train function for other options.

max.mtry

default 2; for method = "rf"; maximum mtry used for the tunegrid to do hyperparameter tuning to optimize the model.

max.ntree

default 200; for method = "rf"; maximum number of trees used to optimize the model.

...

parameters pass to train function of caret package.

Returns

res_train in the object.

Examples
\dontrun{
# random forest
t1$cal_train(method = "rf")
# Support Vector Machines with Radial Basis Function Kernel
t1$cal_train(method = "svmRadial", tuneLength = 15)
}

Method cal_feature_imp()

Get feature importance from the training model.

Usage
trans_classifier$cal_feature_imp(...)
Arguments
...

parameters pass to varImp function of caret package.

Returns

res_feature_imp in the object. One row for each predictor variable. The column(s) are different importance measures.

Examples
\dontrun{
t1$cal_feature_imp()
}

Method cal_predict()

Run the prediction.

Usage
trans_classifier$cal_predict(positive_class = NULL)
Arguments
positive_class

default NULL; see positive parameter in confusionMatrix function of caret package; If positive_class is NULL, use the first group in data as the positive class automatically.

Returns

res_predict, res_confusion_fit and res_confusion_stats stored in the object.

Examples
\dontrun{
t1$cal_predict()
}

Method plot_confusionMatrix()

Plot the cross-tabulation of observed and predicted classes with associated statistics.

Usage
trans_classifier$plot_confusionMatrix(
  plot_confusion = TRUE,
  plot_statistics = TRUE
)
Arguments
plot_confusion

default TRUE; whether plot the confusion matrix.

plot_statistics

default TRUE; whether plot the statistics.

Returns

ggplot object.

Examples
\dontrun{
t1$plot_confusionMatrix()
}

Method cal_ROC()

Get ROC (Receiver Operator Characteristic) curve data and the performance data.

Usage
trans_classifier$cal_ROC(input = "pred")
Arguments
input

default "pred"; 'pred' or 'train'; 'pred' represents using prediction results; 'train' represents using training results.

Returns

a list res_ROC stored in the object.

Examples
\dontrun{
t1$cal_ROC()
}

Method plot_ROC()

Plot ROC curve.

Usage
trans_classifier$plot_ROC(
  plot_type = c("ROC", "PR")[1],
  plot_group = "all",
  color_values = RColorBrewer::brewer.pal(8, "Dark2"),
  add_AUC = TRUE,
  plot_method = FALSE,
  ...
)
Arguments
plot_type

default c("ROC", "PR")[1]; 'ROC' represents ROC (Receiver Operator Characteristic) curve; 'PR' represents PR (Precision-Recall) curve.

plot_group

default "all"; 'all' represents all the classes in the model; 'add' represents all adding micro-average and macro-average results, see https://scikit-learn.org/stable/auto_examples/model_selection/plot_roc.html; other options should be one or more class names, same with the names in Group column of res_ROC$res_roc from cal_ROC function.

color_values

default RColorBrewer::brewer.pal(8, "Dark2"); colors used in the plot.

add_AUC

default TRUE; whether add AUC in the legend.

plot_method

default FALSE; If TRUE, show the method in the legend though only one method is found.

...

parameters pass to geom_path function of ggplot2 package.

Returns

ggplot2 object.

Examples
\dontrun{
t1$plot_ROC(size = 1, alpha = 0.7)
}

Method clone()

The objects of this class are cloneable with this method.

Usage
trans_classifier$clone(deep = FALSE)
Arguments
deep

Whether to make a deep clone.

Examples

## ------------------------------------------------
## Method `trans_classifier$new`
## ------------------------------------------------


data(dataset)
t1 <- trans_classifier$new(
		dataset = dataset, 
		x.predictors = "Genus",
		y.response = "Group")


## ------------------------------------------------
## Method `trans_classifier$cal_preProcess`
## ------------------------------------------------

## Not run: 
t1$cal_preProcess(method = c("center", "scale", "nzv"))

## End(Not run)

## ------------------------------------------------
## Method `trans_classifier$cal_feature_sel`
## ------------------------------------------------


t1$cal_feature_sel(boruta.maxRuns = 300, boruta.pValue = 0.01)


## ------------------------------------------------
## Method `trans_classifier$cal_split`
## ------------------------------------------------


t1$cal_split(prop.train = 3/4)


## ------------------------------------------------
## Method `trans_classifier$set_trainControl`
## ------------------------------------------------

## Not run: 
t1$set_trainControl(method = 'repeatedcv')

## End(Not run)

## ------------------------------------------------
## Method `trans_classifier$cal_train`
## ------------------------------------------------

## Not run: 
# random forest
t1$cal_train(method = "rf")
# Support Vector Machines with Radial Basis Function Kernel
t1$cal_train(method = "svmRadial", tuneLength = 15)

## End(Not run)

## ------------------------------------------------
## Method `trans_classifier$cal_feature_imp`
## ------------------------------------------------

## Not run: 
t1$cal_feature_imp()

## End(Not run)

## ------------------------------------------------
## Method `trans_classifier$cal_predict`
## ------------------------------------------------

## Not run: 
t1$cal_predict()

## End(Not run)

## ------------------------------------------------
## Method `trans_classifier$plot_confusionMatrix`
## ------------------------------------------------

## Not run: 
t1$plot_confusionMatrix()

## End(Not run)

## ------------------------------------------------
## Method `trans_classifier$cal_ROC`
## ------------------------------------------------

## Not run: 
t1$cal_ROC()

## End(Not run)

## ------------------------------------------------
## Method `trans_classifier$plot_ROC`
## ------------------------------------------------

## Not run: 
t1$plot_ROC(size = 1, alpha = 0.7)

## End(Not run)

microeco

Microbial Community Ecology Data Analysis

v0.10.0
GPL-3
Authors
Chi Liu [aut, cre], Felipe R. P. Mansoldo [ctb], Umer Zeeshan Ijaz [ctb], Chenhao Li [ctb], Yang Cao [ctb], Minjie Yao [ctb], Xiangzhen Li [ctb]
Initial release

We don't support your browser anymore

Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.