Depth-Based Classification and Calculation of Data Depth
The package provides many procedures for calculating the depth of points in an empirical distribution for many notions of data depth. Further it provides implementations for depth-based classification, for multivariate and functional data.
The package implements the DDα-classifier (Lange, Mosler and Mozharovskyi, 2014), a nonparametric procedure for supervised binary classification with q≥ 2 classes. In the training step, the sample is first transformed into a q-dimensional cube of depth vectors, then a linear separation rule in its polynomial extension is constructed with the α-procedure. The classification step involves alternative treatments of 'outsiders'.
Package: | ddalpha |
Type: | Package |
Version: | 1.3.11 |
Date: | 2020-01-09 |
License: | GPL-2 |
Use ddalpha.train
to train the DD-classifier and ddalpha.classify
to classify with it.
Load sample classification problems using getdata
. The package contains 50 classification problems built of 33 sets of real data.
The list of the implemented multivariate depths is found in topic depth.
, for functional depths see depthf.
. The depth representations of the multivariate data are obtained with depth.space.
. Functions depth.contours
and depth.contours.ddalpha
build depth contours, and depth.graph
builds depth graphs for two-dimensional data. Function draw.ddplot
draws DD-plot for the existing DD-classifier, or for pre-calculated depth space.
The package supports user-defined depths and classifiers, see topic Custom Methods
. A pre-calculated DD-plot may also be used as data
, see topic ddalpha.train
.
is.in.convex
shows whether an object is no 'outsider', i.e. can be classified by its depth values.
Outsiders are alternatively classified by LDA, kNN and maximum Mahalanobis depth as well as by random assignment.
Use compclassf.train
and ddalphaf.train
to train the functional DD-classifiers and compclassf.classify
ddalpha.classify
to classify with them. Load sample functional classification problems with dataf.*
. The package contains 4 functional data sets and 2 data set generators. The functional data are visualized with plot.functional
.
Oleksii Pokotylo, <alexey.pokotylo at gmail.com>
Pavlo Mozharovskyi, <pavlo.mozharovskyi at ensai.fr>
Rainer Dyckerhoff, <rainer.dyckerhoff at statistik.uni-koeln.de>
Stanislav Nagy, <nagy at karlin.mff.cuni.cz>
Pokotylo, O., Mozharovskyi, P., Dyckerhoff, R. (2019). Depth and depth-based classification with R-package ddalpha. Journal of Statistical Software 91 1–46.
Lange, T., Mosler, K., and Mozharovskyi, P. (2014). Fast nonparametric classification based on data depth. Statistical Papers 55 49–69.
Lange, T., Mosler, K., and Mozharovskyi, P. (2014). DDα-classification of asymmetric and fat-tailed data. In: Spiliopoulou, M., Schmidt-Thieme, L., Janning, R. (eds), Data Analysis, Machine Learning and Knowledge Discovery, Springer (Berlin), 71–78.
Mosler, K. and Mozharovskyi, P. (2017). Fast DD-classification of functional data. Statistical Papers 58 1055–1089.
Mozharovskyi, P. (2015). Contributions to Depth-based Classification and Computation of the Tukey Depth. Verlag Dr. Kovac (Hamburg).
Mozharovskyi, P., Mosler, K., and Lange, T. (2015). Classifying real-world data with the DDα-procedure. Advances in Data Analysis and Classification 9 287–314.
Nagy, S., Gijbels, I. and Hlubinka, D. (2017). Depth-based recognition of shape outlying functions. Journal of Computational and Graphical Statistics. To appear.
# Generate a bivariate normal location-shift classification task # containing 200 training objects and 200 to test with class1 <- mvrnorm(200, c(0,0), matrix(c(1,1,1,4), nrow = 2, ncol = 2, byrow = TRUE)) class2 <- mvrnorm(200, c(2,2), matrix(c(1,1,1,4), nrow = 2, ncol = 2, byrow = TRUE)) trainIndices <- c(1:100) testIndices <- c(101:200) propertyVars <- c(1:2) classVar <- 3 trainData <- rbind(cbind(class1[trainIndices,], rep(1, 100)), cbind(class2[trainIndices,], rep(2, 100))) testData <- rbind(cbind(class1[testIndices,], rep(1, 100)), cbind(class2[testIndices,], rep(2, 100))) data <- list(train = trainData, test = testData) # Train the DDalpha-classifier ddalpha <- ddalpha.train(data$train) # Classify by means of DDalpha-classifier classes <- ddalpha.classify(ddalpha, data$test[,propertyVars]) cat("Classification error rate:", sum(unlist(classes) != data$test[,classVar])/200, "\n") # Calculate zonoid depth of top 10 testing objects w.r.t. 1st class depths.zonoid <- depth.zonoid(data$test[1:10,propertyVars], data$train[trainIndices,propertyVars]) cat("Zonoid depths:", depths.zonoid, "\n") # Calculate the random Tukey depth of top 10 testing objects w.r.t. 1st class depths.halfspace <- depth.halfspace(data$test[1:10,propertyVars], data$train[trainIndices,propertyVars]) cat("Random Tukey depths:", depths.halfspace, "\n") # Calculate depth space with zonoid depth dspace.zonoid <- depth.space.zonoid(data$train[,propertyVars], c(100, 100)) # Calculate depth space with the exact Tukey depth dspace.halfspace <- depth.space.halfspace(data$train[,propertyVars], c(100, 100), exact = TRUE) # Count outsiders numOutsiders = sum(rowSums(is.in.convex(data$test[,propertyVars], data$train[,propertyVars], c(100, 100))) == 0) cat(numOutsiders, "outsiders found in the testing sample.\n")
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.