Sample Space Constraints for Exponential-Family Random Graph Models
ergm
is used to fit exponential-family random graph
models (ERGMs), in which
the probability of a given network, y, on a set of nodes is
h(y) \exp\{η(θ) \cdot
g(y)\}/c(θ), where
h(y) is the reference measure (usually h(y)=1),
g(y) is a vector of network statistics for y,
η(θ) is a natural parameter vector of the same
length (with η(θ)=θ for most terms), and c(θ) is the
normalizing constant for the distribution.
This page describes the constraints (the networks y for which h(y)>0)
that are included with the
ergm
package. Other packages may add new
constraints.
A constraints formula is a one- or two-sided formula whose left-hand
side is an optional direct selection of the InitErgmProposal
function
and whose right-hand
side is a series of one or more terms separated by “+
”
and “-
” operators, specifying the constraint.
The sample space (over and above the reference distribution) is determined by iterating over the constraints terms from left to right, each term updating it as follows:
If the constraint introduces complex dependence structure
(e.g., constrains degree or number of edges in the network), then
this constraint always restricts the sample space. It may only have
a “+
” sign.
If the constraint only restricts the set of dyads that may
vary in the sample space (e.g., block-diagonal structure or fixing
specific dyads at specific values) and has a “+
” sign,
the set of dyads that may vary is restricted to those that may vary
according to this constraint and all the constraints to date.
If the constraint only restricts the set of dyads that may
vary in the sample space but has a “-
” sign,
the set of dyads that may vary is expanded to those that may vary
according to this constraint or all the constraints up to date.
For example, a constraints formula ~a-b+c-d
with all
constraints dyadic will allow dyads permitted by either 'a' or 'b' but
only if they are also permitted by 'c'; as well as all dyads permitted
by 'd'. If 'A', 'B', 'C', and 'D' were logical matrices, the matrix of
variable dyads would be equal to '((A|B)&C)|D'.
Terms with a positive sign can be viewed as “adding” a constraint while those with a negative sign can be viewed as “relaxing” a constraint.
ergm
package.
or NULL
(dyad-independent)A placeholder for no constraints: all networks of a particular size and type have non-zero probability. Cannot be combined with other constraints.
Dyads(fix=NULL, vary=NULL)
(dyad-independent)This is an “operator” constraint that takes one or two ergm
formulas. These formulas should contaion only dyad-independent terms. For the terms in the fix=
formula, dyads that affect the network statistic (i.e., have nonzero change statistic) for any the terms will be fixed at their current values. For the terms in the vary=
formula, only those that change at least one of the terms will be allowed to vary, and all others will be fixed. If both formulas are given, the dyads that vary either for one or for the other will be allowed to vary. Note that a formula passed to Dyads
without an argument name will default to fix=
.
bd(attribs,maxout,maxin,minout,minin)
Constrain maximum and minimum vertex degree. See “Placing Bounds on Degrees” section for more information.
blockdiag(attrname)
(dyad-independent)Force a block-diagonal structure (and its bipartite analogue) on
the network. Only dyads (i,j) for which
attrname(i)==attrname(j)
can have edges.
Note that the current implementation requires that blocks be contiguous for “unipartite” graphs, and for bipartite graphs, they must be contiguous within a partition and must have the same ordering in both partitions. (They do not, however, require that all blocks be represented in both partitions, but those that overlap must have the same order.)
degrees
and nodedegrees
Preserve the degree of each vertex of the given network: only networks whose vertex degrees are the same as those in the network passed in the model formula have non-zero probability. If the network is directed, both indegree and outdegree are preserved.
odegrees
, idegrees
, b1degrees
, b2degrees
For directed networks, odegrees
preserves the outdegree of each vertex of the given
network, while allowing indegree to vary, and conversely for
idegrees
. b1degrees
and b2degrees
perform a
similar function for bipartite networks.
degreedist
Preserve the degree distribution of the given network: only networks whose degree distributions are the same as those in the network passed in the model formula have non-zero probability.
idegreedist
and odegreedist
Preserve the (respectively) indegree or outdegree distribution of the given network.
edges
Preserve the edge count of the given network: only networks having the same number of edges as the network passed in the model formula have non-zero probability.
observed
(dyad-independent)Preserve the observed dyads of the given network.
fixedas(present,absent)
(dyad-independent)Preserve the edges in 'present' and preclude the edges in 'absent'. Both 'present' and 'absent' can take input object as edgelist and network, the latter will convert to the corresponding edgelist.
fixallbut(free.dyads)
(dyad-independent)Preserve the dyad status in all but free.dyads. free.dyads can take input object as edgelist and network, the latter will convert to the corresponding edgelist.
Not all combinations of the above are supported.
There are many times when one may wish to condition on the
number of inedges or outedges possessed by a node, either as a
consequence of some intrinsic property of that node (e.g., to control for
activity or popularity processes), to account
for known outliers of some kind, and thus we wish to limit its indegree, an
intrinsic property of the sampling scheme whence came our data (e.g.,
the survey asked everyone to name only three friends total) or as a
function of the attributes of the nodes to which a node has edges
(e.g., we specify that nodes designated “male” have a maximum number
of outdegrees to nodes designated “female”). To accomplish this we
use the constraints
term bd
.
Let's consider the simple cases first. Suppose you want to condition on the total number of degrees regardless of attributes. That is, if you had a survey that asked respondents to name three alters and no more, then you might want to limit your maximal outdegree to three without regard to any of the alters' attributes. The argument is then:
constraints=~bd(maxout=3)
Similar calls are used to restrict the number of indegrees
(maxin
), the minimum number of outdegrees
(minout
), and the minimum number of indegrees
(minin
).
You can also set ego specific limits. For example:
constraints=bd(maxout=rep(c(3,4),c(36,35)))
limits the first 36 to 3 and the other 35 to 4 outdegrees.
Multiple restrictions can be combined. bd
is very flexible.
In general, the bd
term can contain up to five arguments:
bd(attribs=attribs, maxout=maxout, maxin=maxin, minout=minout, minin=minin)
Omitted arguments are unrestricted, and arguments of length 1
are replicated out to all nodes (as above). If an individual
entry in maxout
,..., minin
is NA
then
no restriction of that kind is applied to that actor.
In general, attribs
is a matrix of the attributes on
which we are conditioning. The dimensions of attribs
are n_nodes
rows by attrcount
columns, where
attrcount
is the number of distinct attribute values
on which we want to condition (i.e., a separate column is
required for “male” and “female” if we want to condition on
the number of ties to both “male” and “female” partners).
The value of attribs[n, i]
, therefore, is TRUE
if node n
has attribute value i, and FALSE
otherwise.
(Note that, since each column represents only a single value
of a single attribute, the values of this matrix are all
Boolean (TRUE
or FALSE
).) It is important to
note that attribs
is a matrix of nodal attributes,
not alter attributes.
So, for instance, if we wanted to construct an attribs
matrix
with two columns, one each for male and female attribute
values (we are conditioning on these values of the attribute
“sex”), and the attribute sex is represented in ads.sex as
an n_node
-long vector of 0s and 1s (men and women),
then our code would look as follows:
# male column: bit vector, TRUE for males attrsex1 <- (ads.sex == 0) # female column: bit vector, TRUE for females attrsex2 <- (ads.sex == 1) # now create attribs matrix attribs <- matrix(ncol=2,nrow=71, data=c(attrsex1,attrsex2))
maxout
is a matrix of alter attributes, with the same
dimensions as the attribs
matrix. maxout
is n_nodes
rows by attrcount
columns. The value of maxout[n,i]
,
therefore, is the maximum number of outdegrees permitted
from node n
to nodes with the attribute i
(where a NA
means there is no maximum).
For example: if we wanted to create a maxout
matrix to work
with our attribs
matrix above, with a maximum from every
node of five outedges to males and five outedges to females,
our code would look like this:
# every node has maximum of 5 outdegrees to male alters maxoutsex1 <- c(rep(5,71)) # every node has maximum of 5 outdegrees to female alters maxoutsex2 <- c(rep(5,71)) # now create maxout matrix maxout <- cbind(maxoutsex1,maxoutsex2)
The maxin
, minout
, and minin
matrices
are constructed exactly like the maxout
matrix,
except for the maximum allowed indegree, the minimum allowed
outdegree, and the minimum allowed indegree, respectively.
Note that in an undirected network, we only look at the outdegree
matrices; maxin
and minin
will both be ignored
in this case.
Goodreau SM, Handcock MS, Hunter DR, Butts CT, Morris M (2008a). A statnet Tutorial. Journal of Statistical Software, 24(8). https://www.jstatsoft.org/v24/i08/.
Hunter, D. R. and Handcock, M. S. (2006) Inference in curved exponential family models for networks, Journal of Computational and Graphical Statistics.
Hunter DR, Handcock MS, Butts CT, Goodreau SM, Morris M (2008b). ergm: A Package to Fit, Simulate and Diagnose Exponential-Family Models for Networks. Journal of Statistical Software, 24(3). https://www.jstatsoft.org/v24/i03/.
Krivitsky PN (2012). Exponential-Family Random Graph Models for Valued Networks. Electronic Journal of Statistics, 2012, 6, 1100-1128. doi: 10.1214/12-EJS696
Morris M, Handcock MS, Hunter DR (2008). Specification of Exponential-Family Random Graph Models: Terms and Computational Aspects. Journal of Statistical Software, 24(4). https://www.jstatsoft.org/v24/i04/.
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.