ergm: san – R documentation

Become an expert in R — Interactive courses, Cheat Sheets, certificates and more!

san

Use Simulated Annealing to attempt to match a network to a vector of mean statistics

Description

This function attempts to find a network or networks whose statistics match those passed in via the target.stats vector.

Usage

san(object, ...)

## S3 method for class 'formula'
san(
  object,
  response = NULL,
  reference = ~Bernoulli,
  constraints = ~.,
  target.stats = NULL,
  nsim = NULL,
  basis = NULL,
  output = c("network", "edgelist", "pending_update_network"),
  only.last = TRUE,
  control = control.san(),
  verbose = FALSE,
  offset.coef = NULL,
  ...
)

## S3 method for class 'ergm_model'
san(
  object,
  response = NULL,
  reference = ~Bernoulli,
  constraints = ~.,
  target.stats = NULL,
  nsim = NULL,
  basis = NULL,
  output = c("network", "edgelist", "pending_update_network"),
  only.last = TRUE,
  control = control.san(),
  verbose = FALSE,
  offset.coef = NULL,
  ...
)

Arguments

`object`	Either a `formula` or an `ergm` object. The `formula` should be of the form `y ~ <model terms>`, where `y` is a network object or a matrix that can be coerced to a `network` object. For the details on the possible `<model terms>`, see `ergm-terms`. To create a `network` object in , use the `network()` function, then add nodal attributes to it using the `%v%` operator if necessary.
`...`	Further arguments passed to other functions.
`response`	Name of the edge attribute whose value is to be modeled in the valued ERGM framework. Defaults to `NULL` for simple presence or absence, modeled via a binary ERGM.
`reference`	A one-sided formula specifying the reference measure (h(y)) to be used. See help for ERGM reference measures implemented in the ergm package.
`constraints`	A one-sided formula specifying one or more constraints on the support of the distribution of the networks being simulated. See the documentation for a similar argument for `ergm` and see list of implemented constraints for more information. For `simulate.formula`, defaults to no constraints. For `simulate.ergm`, defaults to using the same constraints as those with which `object` was fitted.
`target.stats`	A vector of the same length as the number of non-offset statistics implied by the formula, which is either `object` itself in the case of `san.formula` or `object$formula` in the case of `san.ergm`.
`nsim`	Number of networks to generate. Deprecated: just use `replicate()`.
`basis`	If not NULL, a `network` object used to start the Markov chain. If NULL, this is taken to be the network named in the formula.
`output`	Character, one of `"network"` (default), `"edgelist"`, or `"pending_update_network"`: determines the output format. Partial matching is performed.
`only.last`	if `TRUE`, only return the last network generated; otherwise, return a `network.list` with `nsim` networks.
`control`	A list of control parameters for algorithm tuning; see `control.san`.
`verbose`	Logical or numeric giving the level of verbosity. Higher values produce more verbose output.
`offset.coef`	A vector of offset coefficients; these must be passed in by the user. Note that these should be the same set of coefficients one would pass to `ergm` via its `offset.coef` argument. For example, if one has a curved `offset(gwesp)` term in the model, then the `decay` parameter should not be passed in via `offset.coef`.
`formula`	(By default, the `formula` is taken from the `ergm` object. If a different `formula` object is wanted, specify it here.

Details

Acceptance probabilities for proposed toggles are computed as we now describe. There are two contributions: one from targeted statistics and one from offsets.

For the targeted statistics, a matrix of weights W is determined on each san iteration as follows. On the first iteration, the matrix W is the n by n identity matrix (n = number of target statistics), divided by n. On subsequent iterations: if control$SAN.invcov.diag = FALSE (the default), then the matrix W is the inverse of the covariance matrix of the targeted statistics, divided by the sum of its (the inverse's) diagonal; if control$SAN.invcov.diag = TRUE, then W is the inverse of the diagonal (regarded as a matrix) of the covariance matrix of the targeted statistics, divided by the sum of its (the inverse's) diagonal. In either of these two cases, the covariance matrix is computed based on proposals (not acceptances) made on the previous iteration, and the normalization for W is such that sum(diag(W)) = 1. The component of the acceptance probability coming from the targeted statistics is then computed for a given W as exp([y.Wy - x.Wx]/T) where T is the temperature, y the column vector of differences network statistics - target statistics computed before the current proposal is made, x the column vector of differences network statistics - target statistics computed assuming the current proposal is accepted, and . the dot product. If control$SAN.maxit > 1, then on the ith iteration, the temperature T takes the value control$SAN.tau * (1/i - 1/control$SAN.maxit)/(1 - 1/control$SAN.maxit); if control$SAN.maxit = 1, then the temperature T takes the value 0. Thus, T steps down from control$SAN.tau to 0 and is always 0 on the final iteration.

Offsets also contribute to the acceptance probability, as follows. If eta are the canonical offsets and Delta the corresponding change statistics for a given proposal, then the offset contribution to the acceptance probability is simply exp(eta.Delta) where . denotes the dot product. By default, finite offsets are ignored, but this behavior can be changed by setting control$SAN.ignore.finite.offsets = FALSE.

The overall acceptance probability is the product of the targeted statistics contribution and the offset contribution (with the product capped at one).

Value

A network or list of networks that hopefully have network statistics close to the target.stats vector.

Methods (by class)

formula: Sufficient statistics are specified by a formula.
ergm_model: A lower-level function that expects a pre-initialized ergm_model.

Examples

# initialize x to a random undirected network with 50 nodes and a density of 0.1
x <- network(50, density = 0.05, directed = FALSE)
 
# try to find a network on 50 nodes with 300 edges, 150 triangles,
# and 1250 4-cycles, starting from the network x
y <- san(x ~ edges + triangles + cycle(4), target.stats = c(300, 150, 1250))

# check results
summary(y ~ edges + triangles + cycle(4))

# initialize x to a random directed network with 50 nodes
x <- network(50)

# add vertex attributes
x %v% 'give' <- runif(50, 0, 1)
x %v% 'take' <- runif(50, 0, 1)

# try to find a set of 100 directed edges making the outward sum of
# 'give' and the inward sum of 'take' both equal to 62.5, so in
# edges (i,j) the node i tends to have above average 'give' and j
# tends to have above average 'take'
y <- san(x ~ edges + nodeocov('give') + nodeicov('take'), target.stats = c(100, 62.5, 62.5))

# check results
summary(y ~ edges + nodeocov('give') + nodeicov('take'))


# initialize x to a random undirected network with 50 nodes
x <- network(50, directed = FALSE)

# add a vertex attribute
x %v% 'popularity' <- runif(50, 0, 1)

# try to find a set of 100 edges making the total sum of
# popularity(i) and popularity(j) over all edges (i,j) equal to
# 125, so nodes with higher popularity are more likely to be
# connected to other nodes
y <- san(x ~ edges + nodecov('popularity'), target.stats = c(100, 125))
 
# check results
summary(y ~ edges + nodecov('popularity'))

# creates a network with denser "core" spreading out to sparser
# "periphery"
plot(y)

ergm

Fit, Simulate and Diagnose Exponential-Family Models for Networks

v3.11.0

GPL-3 + file LICENSE

Authors

Mark S. Handcock [aut], David R. Hunter [aut], Carter T. Butts [aut], Steven M. Goodreau [aut], Pavel N. Krivitsky [aut, cre] (<https://orcid.org/0000-0002-9101-3362>), Martina Morris [aut], Li Wang [ctb], Kirk Li [ctb], Skye Bender-deMoll [ctb], Chad Klumb [ctb], Michał Bojanowski [ctb], Ben Bolker [ctb]

Initial release

2020-10-14

san