Become an expert in R — Interactive courses, Cheat Sheets, certificates and more!
Get Started for Free

data

Create Simulated Data for Seriation Evaluation


Description

Several functions to create simulated data to evaluate different aspects of seriation algorithms and criterion functions.

Usage

create_lines_data(n = 250)
create_ordered_data(n = 250, k = 2, size = NULL, spacing = 6,  path = "linear",
  sd1 = 1, sd2 = 0)

Arguments

n

number of data points to create.

k

number of Gaussian components.

size

relative size (number of points) of components (length of k). If NULL then all components have the same size.

spacing

space between the centers of components. The default of 6 means that the components will barely touch at ds1=1 (3 standard deviations for each Gaussian component).

path

Are the components arranged along a "linear" or "circular" path?

sd1

variation in the direction along the components. A value greater than one means the components are mixing.

sd2

variation perpendicular to the direction along the components. A value greater than 0 will introduce anti-Robinson violation events.

Details

create_lines_data creates the lines data set used in for iVAT in Havens and Bezdeck (2012).

create_ordered_data is a versatile function which creates "orderable" 2D data using Gaussian components along a linear or circular path. The components are equally spaced (spacing) along the path. The default spacing of 6 ensures that 2 adjacent components with a standard deviation of one along the direction of the path will barely touch. The standard deviation along the path is set by sd1. The standard deviation perpendicular to the path is set by sd2. A value larger than zero will result in the data not being perfectly orderable (i.e., the resulting distance matrix will not be a perfect pre-anti-Robinson matrix and contain anti-Robinson violation events after seriation). Note that a circular path always creates anti-Robinson violation since the circle has to be broken at some point to create a linear order.

Author(s)

Michael Hahsler

References

Havens, T.C. and Bezdek, J.C. (2012): An Efficient Formulation of the Improved Visual Assessment of Cluster Tendency (iVAT) Algorithm, IEEE Transactions on Knowledge and Data Engineering, 24(5), 813–822.

See Also

Examples

## lines data set from Havens and Bezdek (2011) 
x <- create_lines_data(250)
plot(x, xlim=c(-5,5), ylim=c(-3,3), cex=.2, col = attr(x, "id"))  
d <- dist(x)
pimage(d, seriate(d, "OLO_single"), col = bluered(100, bias=.5), key = TRUE)
  
## create_ordered_data can produce many types of "orderable" data

## perfect pre-Anti-Robinson matrix (with a single components)
x <- create_ordered_data(250, k = 1)  
plot(x, cex=.2, col = attr(x, "id"))  
d <- dist(x)
pimage(d, seriate(d, "MDS"), col = bluered(100, bias=.5), key = TRUE)  
  
## separated components
x <- create_ordered_data(250, k = 5)  
plot(x, cex=.2, col = attr(x, "id"))  
d <- dist(x)
pimage(d, seriate(d, "MDS"), col = bluered(100, bias=.5), key = TRUE)  
  
## overlapping components
x <- create_ordered_data(250, k = 5, sd1 = 2)  
plot(x, cex=.2, col = attr(x, "id"))
d <- dist(x)
pimage(d, seriate(d, "MDS"), col = bluered(100, bias=.5), key = TRUE)  
    
## introduce anti-Robinson violations (a non-zero y value)
x <- create_ordered_data(250, k = 5, sd1 = 2, sd2 = 5)  
plot(x, cex=.2, col = attr(x, "id"))  
d <- dist(x)
pimage(d, seriate(d, "MDS"), col = bluered(100, bias=.5), key = TRUE)  
  
## circular path (has always violations)  
x <- create_ordered_data(250, k = 5, path = "circular", sd1=2)  
plot(x, cex=.2, col = attr(x, "id"))
d <- dist(x)
pimage(d, seriate(d, "OLO"), col = bluered(100, bias=.5), key = TRUE)  

## circular path (with more violations violations)  
x <- create_ordered_data(250, k = 5, path = "circular", sd1=2, sd2=1)  
plot(x, cex=.2, col = attr(x, "id"))
d <- dist(x)
pimage(d, seriate(d, "OLO"), col = bluered(100, bias=.5), key = TRUE)

seriation

Infrastructure for Ordering Objects Using Seriation

v1.2-9
GPL-3
Authors
Michael Hahsler [aut, cre, cph], Christian Buchta [aut, cph], Kurt Hornik [aut, cph], Fionn Murtagh [ctb, cph], Michael Brusco [ctb, cph], Stephanie Stahl [ctb, cph], Hans-Friedrich Koehn [ctb, cph]
Initial release
2020-09-29

We don't support your browser anymore

Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.