Create Simulated Data for Seriation Evaluation
Several functions to create simulated data to evaluate different aspects of seriation algorithms and criterion functions.
create_lines_data(n = 250) create_ordered_data(n = 250, k = 2, size = NULL, spacing = 6, path = "linear", sd1 = 1, sd2 = 0)
n |
number of data points to create. |
k |
number of Gaussian components. |
size |
relative size (number of points) of components (length of k).
If |
spacing |
space between the centers of components. The default of 6
means that the components will barely touch at |
path |
Are the components arranged along a |
sd1 |
variation in the direction along the components. A value greater than one means the components are mixing. |
sd2 |
variation perpendicular to the direction along the components. A value greater than 0 will introduce anti-Robinson violation events. |
create_lines_data
creates the lines data set used in
for iVAT in Havens and Bezdeck (2012).
create_ordered_data
is a versatile function which creates "orderable"
2D data using Gaussian components along a linear or circular path. The
components are equally spaced (spacing
) along the path. The
default spacing of 6 ensures that 2 adjacent components with a standard
deviation of one along the direction of the path will barely touch. The
standard deviation along the path is set by sd1
. The standard deviation
perpendicular to the path is set by sd2
. A value larger than zero
will result in the data not being perfectly orderable (i.e., the
resulting distance matrix will not be a perfect pre-anti-Robinson matrix and
contain anti-Robinson violation events after seriation). Note that a circular
path always creates anti-Robinson violation since the circle has to be
broken at some point to create a linear order.
Michael Hahsler
Havens, T.C. and Bezdek, J.C. (2012): An Efficient Formulation of the Improved Visual Assessment of Cluster Tendency (iVAT) Algorithm, IEEE Transactions on Knowledge and Data Engineering, 24(5), 813–822.
## lines data set from Havens and Bezdek (2011) x <- create_lines_data(250) plot(x, xlim=c(-5,5), ylim=c(-3,3), cex=.2, col = attr(x, "id")) d <- dist(x) pimage(d, seriate(d, "OLO_single"), col = bluered(100, bias=.5), key = TRUE) ## create_ordered_data can produce many types of "orderable" data ## perfect pre-Anti-Robinson matrix (with a single components) x <- create_ordered_data(250, k = 1) plot(x, cex=.2, col = attr(x, "id")) d <- dist(x) pimage(d, seriate(d, "MDS"), col = bluered(100, bias=.5), key = TRUE) ## separated components x <- create_ordered_data(250, k = 5) plot(x, cex=.2, col = attr(x, "id")) d <- dist(x) pimage(d, seriate(d, "MDS"), col = bluered(100, bias=.5), key = TRUE) ## overlapping components x <- create_ordered_data(250, k = 5, sd1 = 2) plot(x, cex=.2, col = attr(x, "id")) d <- dist(x) pimage(d, seriate(d, "MDS"), col = bluered(100, bias=.5), key = TRUE) ## introduce anti-Robinson violations (a non-zero y value) x <- create_ordered_data(250, k = 5, sd1 = 2, sd2 = 5) plot(x, cex=.2, col = attr(x, "id")) d <- dist(x) pimage(d, seriate(d, "MDS"), col = bluered(100, bias=.5), key = TRUE) ## circular path (has always violations) x <- create_ordered_data(250, k = 5, path = "circular", sd1=2) plot(x, cex=.2, col = attr(x, "id")) d <- dist(x) pimage(d, seriate(d, "OLO"), col = bluered(100, bias=.5), key = TRUE) ## circular path (with more violations violations) x <- create_ordered_data(250, k = 5, path = "circular", sd1=2, sd2=1) plot(x, cex=.2, col = attr(x, "id")) d <- dist(x) pimage(d, seriate(d, "OLO"), col = bluered(100, bias=.5), key = TRUE)
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.