Become an expert in R — Interactive courses, Cheat Sheets, certificates and more!
Get Started for Free

trans_diff

Create trans_diff object for the differential analysis on the taxonomic abundance.


Description

This class is a wrapper for a series of differential abundance test and indicator analysis methods, including LEfSe based on the Segata et al. (2011) <doi:10.1186/gb-2011-12-6-r60>, random forest <doi:10.1016/j.geoderma.2018.09.035>, metastat based on White et al. (2009) <doi:10.1371/journal.pcbi.1000352>, the method in R package metagenomeSeq Paulson et al. (2013) <doi:10.1038/nmeth.2658>, non-parametric Kruskal-Wallis Rank Sum Test, Dunn's Kruskal-Wallis Multiple Comparisons based on the FSA package, Wilcoxon Rank Sum and Signed Rank Tests, t test and anova.

Authors: Chi Liu, Yang Cao, Chenhao Li

Methods

Public methods


Method new()

Usage
trans_diff$new(
  dataset = NULL,
  method = c("lefse", "rf", "metastat", "mseq", "KW", "KW_dunn", "wilcox", "t.test",
    "anova")[1],
  group = NULL,
  taxa_level = "all",
  filter_thres = 0,
  alpha = 0.05,
  p_adjust_method = "fdr",
  lefse_subgroup = NULL,
  lefse_min_subsam = 10,
  lefse_norm = 1e+06,
  nresam = 0.6667,
  boots = 30,
  rf_ntree = 1000,
  group_choose_paired = NULL,
  mseq_count = 1,
  ...
)
Arguments
dataset

the object of microtable Class.

method

default "lefse"; see the following available options:

'lefse'

LEfSe method based on Segata et al. (2011) <doi:10.1186/gb-2011-12-6-r60>

'rf'

random forest and non-parametric test method based on An et al. (2019) <doi:10.1016/j.geoderma.2018.09.035>

'metastat'

Metastat method for all paired groups based on White et al. (2009) <doi:10.1371/journal.pcbi.1000352>

'mseq'

zero-inflated log-normal model-based differential test method from metagenomeSeq package.

'KW'

KW: Kruskal-Wallis Rank Sum Test for all groups (>= 2)

'KW_dunn'

Dunn's Kruskal-Wallis Multiple Comparisons when group number > 2; see dunnTest function in FSA package

'wilcox'

Wilcoxon Rank Sum and Signed Rank Tests for all paired groups

't.test'

Student's t-Test for all paired groups

'anova'

Duncan's multiple range test for anova

group

default NULL; sample group used for the comparision; a colname of microtable$sample_table.

taxa_level

default "all"; 'all' represents using abundance data at all taxonomic ranks; For testing at a specific rank, provide taxonomic rank name, such as "Genus"; this parameter can be applied when method != "mseq"; 'mseq' method is performed on the feature abudance, i.e. microtable$otu_table.

filter_thres

default 0; the relative abundance threshold used for method != "metastat" or "mseq".

alpha

default 0.05; differential significance threshold for method = "lefse" or "rf"; used to select taxa with significance across groups.

p_adjust_method

default "fdr"; p.adjust method; see method parameter of p.adjust function for other available options; NULL mean disuse the p value adjustment; So when p_adjust_method = NULL, P.adj is same with P.unadj.

lefse_subgroup

default NULL; sample sub group used for sub-comparision in lefse; Segata et al. (2011) <doi:10.1186/gb-2011-12-6-r60>.

lefse_min_subsam

default 10; sample numbers required in the subgroup test.

lefse_norm

default 1000000; scale value in lefse.

nresam

default 0.6667; sample number ratio used in each bootstrap for method = "lefse" or "rf".

boots

default 30; bootstrap test number for method = "lefse" or "rf".

rf_ntree

default 1000; see ntree in randomForest function of randomForest package when method = "rf".

group_choose_paired

default NULL; a vector used for selecting the required groups for paired testing, only used for method = "metastat" or "mseq".

mseq_count

default 1; Filter features to have at least 'counts' counts.; see the count parameter in MRcoefs function of metagenomeSeq package.

...

parameters passed to cal_diff function of trans_alpha class when method is one of "KW", "KW_dunn", "wilcox", "t.test" and "anova".

Returns

res_diff and res_abund.
res_abund includes mean abudance of each taxa (Mean), standard deviation (SD), standard error (SE) and sample number (N) in the group (Group).
res_diff is the detailed differential test result, containing:
"Comparison": The groups for the comparision, maybe all groups or paired groups. If this column is not found, all groups used;
"Group": Which group has the maximum median or mean value across the test groups; For non-parametric methods, median value; For t.test, mean value;
"Taxa": which taxa is used in this comparision;
"Method": Test method used in the analysis depending on the method input;
"LDA" or "MeanDecreaseGini": LDA: linear discriminant score in LEfSe; MeanDecreaseGini: mean decreasing gini index in random forest;
"P.unadj" and "P.adj": raw p value; P.adj: adjusted p value;
"qvalue": qvalue for metastat analysis.

Examples
\donttest{
data(dataset)
t1 <- trans_diff$new(dataset = dataset, method = "lefse", group = "Group")
t1 <- trans_diff$new(dataset = dataset, method = "rf", group = "Group")
t1 <- trans_diff$new(dataset = dataset, method = "metastat", group = "Group", taxa_level = "Genus")
t1 <- trans_diff$new(dataset = dataset, method = "wilcox", group = "Group")
t1 <- trans_diff$new(dataset = dataset, method = "KW_dunn", group = "Group", taxa_level = "Phylum")
}

Method plot_diff_abund()

Plotting the abundance of differential taxa.

Usage
trans_diff$plot_diff_abund(
  use_number = 1:20,
  color_values = RColorBrewer::brewer.pal(8, "Dark2"),
  select_group = NULL,
  select_taxa = NULL,
  simplify_names = TRUE,
  keep_prefix = TRUE,
  group_order = NULL,
  barwidth = 0.9,
  use_se = TRUE,
  add_sig = FALSE,
  add_sig_label = "Significance",
  add_sig_label_color = "black",
  add_sig_tip_length = 0.01,
  y_start = 1.01,
  y_increase = 0.05,
  text_y_size = 10,
  coord_flip = TRUE,
  ...
)
Arguments
use_number

default 1:20; numeric vector; the taxa numbers (1:n) used in the plot; If the n is larger than the number of total significant taxa, automatically use all the taxa.

color_values

default RColorBrewer::brewer.pal(8, "Dark2"); colors palette.

select_group

default NULL; this is used to select the paired groups. This parameter is especially useful when the comparision methods is applied to paired groups; The input select_group must be one of object$res_diff$Comparison.

select_taxa

default NULL; character vector to provide taxa names. The taxa names should be same with the names shown in the plot, not the 'Taxa' column names in object$res_diff$Taxa.

simplify_names

default TRUE; whether use the simplified taxonomic name.

keep_prefix

default TRUE; whether retain the taxonomic prefix.

group_order

default NULL; a vector to order groups, i.e. reorder the legend and colors in plot; If NULL, the function can first check whether the group column of sample_table is factor. If yes, use the levels in it. If provided, overlook the levels in the group of sample_table.

barwidth

default 0.9; the bar width in plot.

use_se

default TRUE; whether use SE in plot, if FALSE, use SD.

add_sig

default FALSE; whether add the significance label to the plot.

add_sig_label

default "Significance"; select a colname of object$res_diff for the label text, such as 'P.adj' or 'Significance'.

add_sig_label_color

default "black"; the color for the label text when add_sig = TRUE.

add_sig_tip_length

default 0.01; the tip length for the added line when add_sig = TRUE.

y_start

default 1.01; the y axis position from which to add the label; the default 1.01 means 1.01 * Value; For method != "anova", all the start positions are same, i.e. Value = max(Mean+SD or Mean+SE); For method = "anova"; the stat position is calculated for each point, i.e. Value = Mean+SD or Mean+SE.

y_increase

default 0.05; the increasing y axia space to add label for paired groups; the default 0.05 means 0.05 * y_start * Value; In addition, this parameter is also used to label the letters of anova result with the fixed (1 + y_increase) * y_start * Value.

text_y_size

default 10; the size for the y axis text.

coord_flip

default TRUE; whether flip cartesian coordinates so that horizontal becomes vertical, and vertical, horizontal.

...

parameters passed to ggsignif::stat_signif when add_sig = TRUE.

Returns

ggplot.

Examples
\donttest{
t1 <- trans_diff$new(dataset = dataset, method = "anova", group = "Group", taxa_level = "Genus")
t1$plot_diff_abund(use_number = 1:10)
t1$plot_diff_abund(use_number = 1:10, add_sig = TRUE)
t1 <- trans_diff$new(dataset = dataset, method = "wilcox", group = "Group")
t1$plot_diff_abund(use_number = 1:20)
t1$plot_diff_abund(use_number = 1:20, add_sig = TRUE)
t1 <- trans_diff$new(dataset = dataset, method = "lefse", group = "Group")
t1$plot_diff_abund(use_number = 1:20)
t1$plot_diff_abund(use_number = 1:20, add_sig = TRUE)
}

Method plot_diff_bar()

Bar plot for LDA score.

Usage
trans_diff$plot_diff_bar(
  color_values = RColorBrewer::brewer.pal(8, "Dark2"),
  use_number = 1:10,
  threshold = NULL,
  select_group = NULL,
  simplify_names = TRUE,
  keep_prefix = TRUE,
  group_order = NULL,
  axis_text_y = 12,
  plot_vertical = TRUE,
  ...
)
Arguments
color_values

default RColorBrewer::brewer.pal(8, "Dark2"); colors palette for different groups.

use_number

default 1:10; numeric vector; the taxa numbers used in the plot, i.e. 1:n.

threshold

default NULL; threshold value for selecting taxa, such as 3 for LDA score of LEfSe.

select_group

default NULL; this is used to select the paired group when multiple comparisions are generated; The input select_group must be one of object$res_diff$Comparison.

simplify_names

default TRUE; whether use the simplified taxonomic name.

keep_prefix

default TRUE; whether retain the taxonomic prefix.

group_order

default NULL; a vector to order the legend and colors in plot; If NULL, the function can first check whether the group column of sample_table is factor. If yes, use the levels in it. If provided, this parameter can overwrite the levels in the group of sample_table.

axis_text_y

default 12; the size for the y axis text.

plot_vertical

default TRUE; whether use vertical bar plot or horizontal.

...

parameters pass to geom_bar

Returns

ggplot.

Examples
\donttest{
t1$plot_diff_bar(use_number = 1:20)
}

Method plot_diff_cladogram()

Plot the cladogram using taxa with significant difference.

Usage
trans_diff$plot_diff_cladogram(
  color = RColorBrewer::brewer.pal(8, "Dark2"),
  use_taxa_num = 200,
  filter_taxa = NULL,
  use_feature_num = NULL,
  group_order = NULL,
  clade_label_level = 4,
  select_show_labels = NULL,
  only_select_show = FALSE,
  sep = "|",
  branch_size = 0.2,
  alpha = 0.2,
  clade_label_size = 2,
  clade_label_size_add = 5,
  clade_label_size_log = exp(1),
  node_size_scale = 1,
  node_size_offset = 1,
  annotation_shape = 22,
  annotation_shape_size = 5
)
Arguments
color

default RColorBrewer::brewer.pal(8, "Dark2"); color palette used in the plot.

use_taxa_num

default 200; integer; The taxa number used in the background tree plot; select the taxa according to the mean abundance .

filter_taxa

default NULL; The mean relative abundance used to filter the taxa with low abundance.

use_feature_num

default NULL; integer; The feature number used in the plot; select the features according to the LDA score (method = "lefse") or MeanDecreaseGini (method = "rf") from high to low.

group_order

default NULL; a vector to order the legend and colors in plot; If NULL, the function can first check whether the group column of sample_table is factor. If yes, use the levels in it. If provided, this parameter can overwrite the levels in the group of sample_table.

clade_label_level

default 4; the taxonomic level for marking the label with letters, root is the largest.

select_show_labels

default NULL; character vector; The features to show in the plot with full label names, not the letters.

only_select_show

default FALSE; whether only use the the select features in the parameter select_show_labels.

sep

default "|"; the seperate character in the taxonomic information.

branch_size

default 0.2; numberic, size of branch.

alpha

default 0.2; shading of the color.

clade_label_size

default 2; basic size for the clade label; please also see clade_label_size_add and clade_label_size_log

clade_label_size_add

default 5; added basic size for the clade label; see the formula in clade_label_size_log parameter.

clade_label_size_log

default exp(1); the base of log function for added size of the clade label; the size formula: clade_label_size + log(clade_label_level + clade_label_size_add, base = clade_label_size_log); so use clade_label_size_log, clade_label_size_add and clade_label_size can totally control the label size for different taxonomic levels.

node_size_scale

default 1; scale for the node size.

node_size_offset

default 1; offset for the node size.

annotation_shape

default 22; shape used in the annotation legend.

annotation_shape_size

default 5; size used in the annotation legend.

Returns

ggplot.

Examples
\donttest{
t1$plot_diff_cladogram(use_taxa_num = 100, use_feature_num = 30, select_show_labels = NULL)
}

Method print()

Print the trans_alpha object.

Usage
trans_diff$print()

Method clone()

The objects of this class are cloneable with this method.

Usage
trans_diff$clone(deep = FALSE)
Arguments
deep

Whether to make a deep clone.

Examples

## ------------------------------------------------
## Method `trans_diff$new`
## ------------------------------------------------


data(dataset)
t1 <- trans_diff$new(dataset = dataset, method = "lefse", group = "Group")
t1 <- trans_diff$new(dataset = dataset, method = "rf", group = "Group")
t1 <- trans_diff$new(dataset = dataset, method = "metastat", group = "Group", taxa_level = "Genus")
t1 <- trans_diff$new(dataset = dataset, method = "wilcox", group = "Group")
t1 <- trans_diff$new(dataset = dataset, method = "KW_dunn", group = "Group", taxa_level = "Phylum")


## ------------------------------------------------
## Method `trans_diff$plot_diff_abund`
## ------------------------------------------------


t1 <- trans_diff$new(dataset = dataset, method = "anova", group = "Group", taxa_level = "Genus")
t1$plot_diff_abund(use_number = 1:10)
t1$plot_diff_abund(use_number = 1:10, add_sig = TRUE)
t1 <- trans_diff$new(dataset = dataset, method = "wilcox", group = "Group")
t1$plot_diff_abund(use_number = 1:20)
t1$plot_diff_abund(use_number = 1:20, add_sig = TRUE)
t1 <- trans_diff$new(dataset = dataset, method = "lefse", group = "Group")
t1$plot_diff_abund(use_number = 1:20)
t1$plot_diff_abund(use_number = 1:20, add_sig = TRUE)


## ------------------------------------------------
## Method `trans_diff$plot_diff_bar`
## ------------------------------------------------


t1$plot_diff_bar(use_number = 1:20)


## ------------------------------------------------
## Method `trans_diff$plot_diff_cladogram`
## ------------------------------------------------


t1$plot_diff_cladogram(use_taxa_num = 100, use_feature_num = 30, select_show_labels = NULL)

microeco

Microbial Community Ecology Data Analysis

v0.10.0
GPL-3
Authors
Chi Liu [aut, cre], Felipe R. P. Mansoldo [ctb], Umer Zeeshan Ijaz [ctb], Chenhao Li [ctb], Yang Cao [ctb], Minjie Yao [ctb], Xiangzhen Li [ctb]
Initial release

We don't support your browser anymore

Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.