Become an expert in R — Interactive courses, Cheat Sheets, certificates and more!
Get Started for Free

ergm-constraints

Sample Space Constraints for Exponential-Family Random Graph Models


Description

ergm is used to fit exponential-family random graph models (ERGMs), in which the probability of a given network, y, on a set of nodes is h(y) \exp\{η(θ) \cdot g(y)\}/c(θ), where h(y) is the reference measure (usually h(y)=1), g(y) is a vector of network statistics for y, η(θ) is a natural parameter vector of the same length (with η(θ)=θ for most terms), and c(θ) is the normalizing constant for the distribution.

This page describes the constraints (the networks y for which h(y)>0) that are included with the ergm package. Other packages may add new constraints.

Constraints formula

A constraints formula is a one- or two-sided formula whose left-hand side is an optional direct selection of the InitErgmProposal function and whose right-hand side is a series of one or more terms separated by “+” and “-” operators, specifying the constraint.

The sample space (over and above the reference distribution) is determined by iterating over the constraints terms from left to right, each term updating it as follows:

  • If the constraint introduces complex dependence structure (e.g., constrains degree or number of edges in the network), then this constraint always restricts the sample space. It may only have a “+” sign.

  • If the constraint only restricts the set of dyads that may vary in the sample space (e.g., block-diagonal structure or fixing specific dyads at specific values) and has a “+” sign, the set of dyads that may vary is restricted to those that may vary according to this constraint and all the constraints to date.

  • If the constraint only restricts the set of dyads that may vary in the sample space but has a “-” sign, the set of dyads that may vary is expanded to those that may vary according to this constraint or all the constraints up to date.

For example, a constraints formula ~a-b+c-d with all constraints dyadic will allow dyads permitted by either 'a' or 'b' but only if they are also permitted by 'c'; as well as all dyads permitted by 'd'. If 'A', 'B', 'C', and 'D' were logical matrices, the matrix of variable dyads would be equal to '((A|B)&C)|D'.

Terms with a positive sign can be viewed as “adding” a constraint while those with a negative sign can be viewed as “relaxing” a constraint.

Constraints implemented in the ergm package

. or NULL (dyad-independent)

A placeholder for no constraints: all networks of a particular size and type have non-zero probability. Cannot be combined with other constraints.

Dyads(fix=NULL, vary=NULL) (dyad-independent)

This is an “operator” constraint that takes one or two ergm formulas. These formulas should contaion only dyad-independent terms. For the terms in the fix= formula, dyads that affect the network statistic (i.e., have nonzero change statistic) for any the terms will be fixed at their current values. For the terms in the vary= formula, only those that change at least one of the terms will be allowed to vary, and all others will be fixed. If both formulas are given, the dyads that vary either for one or for the other will be allowed to vary. Note that a formula passed to Dyads without an argument name will default to fix=.

bd(attribs,maxout,maxin,minout,minin)

Constrain maximum and minimum vertex degree. See “Placing Bounds on Degrees” section for more information.

blockdiag(attrname) (dyad-independent)

Force a block-diagonal structure (and its bipartite analogue) on the network. Only dyads (i,j) for which attrname(i)==attrname(j) can have edges.

Note that the current implementation requires that blocks be contiguous for “unipartite” graphs, and for bipartite graphs, they must be contiguous within a partition and must have the same ordering in both partitions. (They do not, however, require that all blocks be represented in both partitions, but those that overlap must have the same order.)

degrees and nodedegrees

Preserve the degree of each vertex of the given network: only networks whose vertex degrees are the same as those in the network passed in the model formula have non-zero probability. If the network is directed, both indegree and outdegree are preserved.

odegrees, idegrees, b1degrees, b2degrees

For directed networks, odegrees preserves the outdegree of each vertex of the given network, while allowing indegree to vary, and conversely for idegrees. b1degrees and b2degrees perform a similar function for bipartite networks.

degreedist

Preserve the degree distribution of the given network: only networks whose degree distributions are the same as those in the network passed in the model formula have non-zero probability.

idegreedist and odegreedist

Preserve the (respectively) indegree or outdegree distribution of the given network.

edges

Preserve the edge count of the given network: only networks having the same number of edges as the network passed in the model formula have non-zero probability.

observed (dyad-independent)

Preserve the observed dyads of the given network.

fixedas(present,absent) (dyad-independent)

Preserve the edges in 'present' and preclude the edges in 'absent'. Both 'present' and 'absent' can take input object as edgelist and network, the latter will convert to the corresponding edgelist.

fixallbut(free.dyads) (dyad-independent)

Preserve the dyad status in all but free.dyads. free.dyads can take input object as edgelist and network, the latter will convert to the corresponding edgelist.

Not all combinations of the above are supported.

Placing Bounds on Degrees:

There are many times when one may wish to condition on the number of inedges or outedges possessed by a node, either as a consequence of some intrinsic property of that node (e.g., to control for activity or popularity processes), to account for known outliers of some kind, and thus we wish to limit its indegree, an intrinsic property of the sampling scheme whence came our data (e.g., the survey asked everyone to name only three friends total) or as a function of the attributes of the nodes to which a node has edges (e.g., we specify that nodes designated “male” have a maximum number of outdegrees to nodes designated “female”). To accomplish this we use the constraints term bd.

Let's consider the simple cases first. Suppose you want to condition on the total number of degrees regardless of attributes. That is, if you had a survey that asked respondents to name three alters and no more, then you might want to limit your maximal outdegree to three without regard to any of the alters' attributes. The argument is then:

constraints=~bd(maxout=3)

Similar calls are used to restrict the number of indegrees (maxin), the minimum number of outdegrees (minout), and the minimum number of indegrees (minin).

You can also set ego specific limits. For example:

constraints=bd(maxout=rep(c(3,4),c(36,35)))

limits the first 36 to 3 and the other 35 to 4 outdegrees.

Multiple restrictions can be combined. bd is very flexible. In general, the bd term can contain up to five arguments:

bd(attribs=attribs,
       maxout=maxout,
       maxin=maxin,
       minout=minout,
       minin=minin)

Omitted arguments are unrestricted, and arguments of length 1 are replicated out to all nodes (as above). If an individual entry in maxout,..., minin is NA then no restriction of that kind is applied to that actor.

In general, attribs is a matrix of the attributes on which we are conditioning. The dimensions of attribs are n_nodes rows by attrcount columns, where attrcount is the number of distinct attribute values on which we want to condition (i.e., a separate column is required for “male” and “female” if we want to condition on the number of ties to both “male” and “female” partners). The value of attribs[n, i], therefore, is TRUE if node n has attribute value i, and FALSE otherwise. (Note that, since each column represents only a single value of a single attribute, the values of this matrix are all Boolean (TRUE or FALSE).) It is important to note that attribs is a matrix of nodal attributes, not alter attributes.

So, for instance, if we wanted to construct an attribs matrix with two columns, one each for male and female attribute values (we are conditioning on these values of the attribute “sex”), and the attribute sex is represented in ads.sex as an n_node-long vector of 0s and 1s (men and women), then our code would look as follows:

# male column: bit vector, TRUE for males
 attrsex1 <- (ads.sex == 0) 
 # female column: bit vector, TRUE for females
 attrsex2 <- (ads.sex == 1)
 # now create attribs matrix
 attribs <- matrix(ncol=2,nrow=71, data=c(attrsex1,attrsex2))

maxout is a matrix of alter attributes, with the same dimensions as the attribs matrix. maxout is n_nodes rows by attrcount columns. The value of maxout[n,i], therefore, is the maximum number of outdegrees permitted from node n to nodes with the attribute i (where a NA means there is no maximum).

For example: if we wanted to create a maxout matrix to work with our attribs matrix above, with a maximum from every node of five outedges to males and five outedges to females, our code would look like this:

# every node has maximum of 5 outdegrees to male alters
      maxoutsex1 <- c(rep(5,71))
      # every node has maximum of 5 outdegrees to female alters
      maxoutsex2 <- c(rep(5,71))
      # now create maxout matrix
      maxout <- cbind(maxoutsex1,maxoutsex2)

The maxin, minout, and minin matrices are constructed exactly like the maxout matrix, except for the maximum allowed indegree, the minimum allowed outdegree, and the minimum allowed indegree, respectively. Note that in an undirected network, we only look at the outdegree matrices; maxin and minin will both be ignored in this case.

References

Goodreau SM, Handcock MS, Hunter DR, Butts CT, Morris M (2008a). A statnet Tutorial. Journal of Statistical Software, 24(8). https://www.jstatsoft.org/v24/i08/.

Hunter, D. R. and Handcock, M. S. (2006) Inference in curved exponential family models for networks, Journal of Computational and Graphical Statistics.

Hunter DR, Handcock MS, Butts CT, Goodreau SM, Morris M (2008b). ergm: A Package to Fit, Simulate and Diagnose Exponential-Family Models for Networks. Journal of Statistical Software, 24(3). https://www.jstatsoft.org/v24/i03/.

Krivitsky PN (2012). Exponential-Family Random Graph Models for Valued Networks. Electronic Journal of Statistics, 2012, 6, 1100-1128. doi: 10.1214/12-EJS696

Morris M, Handcock MS, Hunter DR (2008). Specification of Exponential-Family Random Graph Models: Terms and Computational Aspects. Journal of Statistical Software, 24(4). https://www.jstatsoft.org/v24/i04/.


ergm

Fit, Simulate and Diagnose Exponential-Family Models for Networks

v3.11.0
GPL-3 + file LICENSE
Authors
Mark S. Handcock [aut], David R. Hunter [aut], Carter T. Butts [aut], Steven M. Goodreau [aut], Pavel N. Krivitsky [aut, cre] (<https://orcid.org/0000-0002-9101-3362>), Martina Morris [aut], Li Wang [ctb], Kirk Li [ctb], Skye Bender-deMoll [ctb], Chad Klumb [ctb], Michał Bojanowski [ctb], Ben Bolker [ctb]
Initial release
2020-10-14

We don't support your browser anymore

Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.