optpart: maxsimset – R documentation

Become an expert in R — Interactive courses, Cheat Sheets, certificates and more!

maxsimset

Maximally Similar Sets Analysis

Description

Maximally similar sets is an approach to deriving relatively homogeneous subsets of objects as determined by similarity of the composition of the objects. Maximally similar sets are a covering, as opposed to a partition, of objects. The sets so derived can be tested against random sets of the same size to determine whether a vector of independent data exhibits an improbably restricted distribution within the sets.

Usage

maxsimset(dist,size=NULL,alphac=NULL,mean=FALSE)
mss.test(mss, env, panel = 'all', main = deparse(substitute(env)), 
         ...)
## S3 method for class 'mss'
plot(x, ...)
## S3 method for class 'mss'
getsets(mss)

Arguments

`dist`	a dist object from `dist`, `dsvdis`, or `vegdist`
`size`	the size of desired sets
`alphac`	the alpha-cut to specify maximum dissimilarity for inclusion in a set
`mean`	if mean is FALSE (the default), the algorithm uses a furthest neighbor criterion; if mean is TRUE, it uses a mean similarity criterion
`mss`	an object of class ‘mss’
`env`	a quantitative environmental variable for analysis
`main`	a title for the plot of mss.test
`panel`	an integer switch to indicate which panel to draw
`x`	an object of class ‘mss’ from maxsimset
`...`	ancillary arguments for ‘plot’

Details

maxsimset starts with each sample as a seed, and adds the most similar plot to the set. Plots are added in turn to the set (up to the size specified, or to the maximum dissimilarity specified) in order of maximum similarity. If mean is FALSE, the sample most similar to set is the sample with the max-min similarity, that is, the sample whose minimum similarity to the set if highest, equivalent to furthest-neighbor or complete-linkage in cluster analysis. If mean is TRUE, the sample most similar to a set is the sample with highest mean similarity to the set. Once the sets are determined for each seed, the list is examined for duplicate sets, which are deleted, to return the list of unique sets.

If ‘alphac’ is specified, sets are grown to maximum size, or to maximum dissimilarity as specified by alphac, whichever is smaller.

The ‘mss.test’ function analyzes within-set variability in attributes of the objects other than those used to calculate the similarity relation. If maximally similar sets exhibit a narrower range of values than expected at random it may be that the variable analyzed has an underlying role in determining the attributes on which the similarity is calculated. The function ‘plot’ plots the sorted within-set range of values in red, and the sorted range of values of random sets of the same size in black. This followed by a boxplot of within-set values for the random replicates versus the observed sets, and calculates a Wilcoxon rank sum test of the difference.

‘getsets’ expands and pulls out the maximally similar sets as a list of logical membership vectors for use in other analyses.

Value

an object of class ‘mss’, a list with elements:

`musubx`	a matrix of sample membership in the sets where membership is given by the similarity with which a sample joined the set
`member`	a list of set members in the order they were added to the set
`numset`	the number of unique sets derived
`size`	the number of members in each set
`distname`	the name of the dissimilarity/distance object employed

Author(s)

David W. Roberts droberts@montana.edu

Examples

data(shoshveg)
data(shoshsite)
dis.bc <- dsvdis(shoshveg,'bray/curtis')
mss.10 <- maxsimset(dis.bc,10)
## Not run: mss.test(mss.10,shoshsite$elevation) 
      # plots graph and produces summary

optpart

Optimal Partitioning of Similarity Relations

v3.0-3

GPL (>= 2)

Authors

David W. Roberts <droberts@montana.edu>

Initial release