RSiena: siena07 – R documentation

Become an expert in R — Interactive courses, Cheat Sheets, certificates and more!

siena07

Function to estimate parameters in a Siena model

Description

Estimates parameters in a Siena model using Method of Moments, based on direct simulation, conditional or otherwise; or using Generalized Method of Moments; or using Maximum Likelihood by MCMC simulation. Estimation is done using a Robbins-Monro algorithm. Note that the data and particular model to be used must be passed in using named arguments as the ..., and the specification for the algorithm must be passed on as x, which is a sienaAlgorithm object as produced by sienaAlgorithmCreate (see examples).

Usage

siena07(x, batch=FALSE, verbose=FALSE, silent=FALSE,
        useCluster=FALSE, nbrNodes=2,
        thetaValues = NULL,
        initC=TRUE,
        clusterString=rep("localhost", nbrNodes), tt=NULL,
        parallelTesting=FALSE, clusterIter=!x$maxlike,
        clusterType=c("PSOCK", "FORK"), cl=NULL, ...)

Arguments

`x`	A control object, of class `sienaAlgorithm.`
`batch`	Desired interface: `FALSE` gives a gui (graphical user interface implemented as a tcl/tk screen), `TRUE` gives a small (if `verbose=FALSE`) amount of printout to the console.
`verbose`	Produces various output to the console if `TRUE`.
`silent`	Produces no output to the console if `TRUE`, even if batch mode.
`useCluster`	Boolean: whether to use a cluster of processes (useful if multiple processors are available).
`nbrNodes`	Number of processes to use if useCluster is `TRUE`.
`thetaValues`	If not `NULL`, this should be a matrix with parameter values to be used in Phase 3. The number of columns must be equal to the number of estimated parameters in the effects object (if conditional estimation is used, without the rate parameters for the conditioning dependent variable). Can only be used if `x$simOnly=TRUE`.
`initC`	Boolean: set to `TRUE` if the simulation will use C routines (currently always needed). Only for use if using multiple processors, to ensure all copies are initialised correctly. Ignored otherwise, so is set to `TRUE` by default.
`clusterString`	Definitions of clusters. Default set up to use the local machine only.
`tt`	A `tcltk` toplevel window. Used if called from the model options screen.
`parallelTesting`	Boolean. If `TRUE`, sets up random numbers to parallel those in Siena 3.
`clusterIter`	Boolean. If `TRUE`, multiple processes execute complete iterations at each call. If `FALSE`, multiple processes execute a single wave at each call.
`clusterType`	Either "PSOCK" or "FORK". On Windows, must be "PSOCK". On a single non-Windows machine may be "FORK", and subprocesses will be formed by forking. If "PSOCK", subprocesses are formed using R scripts.
`cl`	An object of class c("SOCKcluster", "cluster") (see Details).
`...`	Arguments for the simulation function, see `simstats0c`: in any case, `data` and `effects`, as in the examples below; possibly also `prevAns` if a previous reasonable provisional estimate was obtained for a similar model; possibly also `returnDeps` if the simulated dependent variables (networks, behaviour) should be returned; possibly also `returnChains` if the simulated sequences (chains) of ministeps should be returned; this may produce a very big file.

Details

This is the main function and workhorse of RSiena.

For use of siena07, it is necessary to specify parameters data (RSiena data set) and effects (effects object), which are required parameters in function simstats0c. (These parameters are inserted through '...'.) See the examples.

siena07 runs a Robbins-Monro algorithm for parameter estimation according to the Method of Moments using the three-phase implementation in Snijders (2001) and Snijders, Steglich and Schweinberger (2007), with (if x$findiff=FALSE) derivative estimation as in Schweinberger and Snijders (2007).
If x$gmm=TRUE and myeff contains one or more gmm statistics as included by function includeGMoMStatistics, the algorithm employs the Generalized Method of Moments as defined in Amati, Schoenenberger, and Snijders (2015, 2019).
Phase 1 does a few iterations to estimate the derivative matrix of the targets with respect to the parameter vector. Phase 2 does the estimation. Phase 3 runs a simulation to estimate standard errors and check convergence of the model. The simulation function is called once for each iteration in these phases and also once to initialise the model fitting and once to complete it. Unless in batch mode, a tcl/tk screen is displayed to allow interruption and to show progress.

If x$maxlike=TRUE, estimation is done by Maximum Likelihood implemented as in Snijders, Koskinen and Schweinberger (2010); also using the three-phase Robbins-Monro algorithm.

It is necessary to check that convergence has been achieved. The rule of thumb is that the all t-ratios for convergence should be in absolute value less than 0.1 and the overall maximum convergence ratio should be less than 0.25. If this was not achieved, the result can be used to start another estimation run from the estimate obtained, using the parameter prevAns as illustrated in the example below. (This parameter is inserted through '...' into the function initializeFRAN.)

For good estimation of standard errors, it is necessary that x$n3 is large enough. More about this is in the manual. The default value x$n3 set in sienaAlgorithmCreate is adequate for most explorative use, but for presentation in publications larger values are necessary, depending on the data set and model; e.g., x$n3=3000 or larger.

Parameters can be tested against zero by dividing the estimate by its standard error and using an approximate standard normal null distribution. Further, functions Wald.RSiena and Multipar.RSiena are available for multi-parameter testing.
Parameters specified in includeEffects or setEffect with fix=TRUE, test=TRUE will not be estimated; score tests of their hypothesized values are reported in the output file specified in the control (algorithm) object. These tests can be obtained also using score.Test.

If x$simOnly is TRUE, which is meant to go together with x$nsub=0, the calculation of the standard errors and covariance matrix at the end of Pase 3 is skipped. No estimation is performed. If thetaValues is not NULL, the parameter values in the rows of this matrix will be used in the consecutive runs of Phase 3. If x$n3 is larger than the number of rows times nbrNodes (see below), the last row of thetaValues will continue to be used. The parameter values actually used will be stored in the output matrix thetaUsed.

In the case of using multiple processors, there are two options for telling siena07 to use them. By specifying the options useCluster, nbrNodes, clusterString and initC, siena07 will create a cluster object that will be used by the parallel package. After finishing the estimation procedure, siena07 will automatically stop the cluster. Alternatively, instead of having the function to create a cluster, the user may provide its own by specifying the option cl, similar to what the boot function does in the boot package. By using the option cl the user may be able to create more complex clusters (see examples below).

If thetaValues is not NULL and nbrNodes >= 2, parameters in Phase 3 will be constant for each set of nbrNodes consecutive simulations. This must be noted in the interpretation, and will be visible in thetaUsed (see below).

Value

Returns an object of class sienaFit, some parts of which are:

`OK`	Boolean indicating successful termination
`termination`	Character string, values: "OK", "Error", or "UserInterrupt". "UserInterrupt" indicates that the user asked for early termination before phase 3.
`f`	Various characteristics of the data and model definition.
`requestedEffects`	The included effects in the effects object.
`effects`	The included effects in the effects object to which are added the main effects of the requested interaction effects, if any.
`theta`	Estimated value of theta, if `x$simOnly=FALSE`.
`covtheta`	Estimated covariance matrix of theta; this is not available if `x$simOnly=TRUE`.
`se`	Vector of standard errors of estimated theta, if `x$simOnly=FALSE`.
`dfra`	Matrix of estimated derivatives.
`sf`	Matrix of simulated deviations from targets in phase 3.
`sf2`	Array of periodwise deviations from simulations in phase 3. Not included if `x$lessMem=FALSE`.
`tconv`	t-statistics for convergence.
`tmax`	maximum absolute t-statistic for convergence for non-fixed parameters.
`tconv.max`	overall maximum convergence ratio.
`targets`	Observed statistics; for ML, zero vector.
`targets2`	Observed statistics by wave, starting with second wave; for ML, zero matrix.
`ssc`	Score function contributions for each wave for each simulation in phase 3. Not included if finite difference method is used or if `x$lessMem=FALSE`.
`scores`	Score functions, added over waves, for each simulation in phase 3. Only included if `x$lessMem=FALSE`.
`regrCoef`	If `x$dolby` and not `x$maxlike`: regression coefficients of estimation statistics on score functions.
`regrCor`	If `x$dolby` and not `x$maxlike`: correlations between estimation statistics and score functions.
`estMeans`	Estimated means of estimation statistics.
`estMeans.sem`	If `x$simOnly`: Standard errors of the estimated means of estimation statistics.
`sims`	If `returnDeps=TRUE`: list of simulated dependent variables (networks, behaviour). Networks are given as a list of edgelists, one for each period. The structure of sims is a nested list: `sims[[run]][[group]][[dependent variable]][[period]]`. If `x$maxlike=TRUE` and there is only one group and one period, the structure is `[[run]][[dependent variable]]`.
`chain`	If `returnChains = TRUE`: list, or data frame, of simulated chains of ministeps. The chain has the structure `chain[[run]][[depvar]][[period]][[ministep]]`.
`Phase3nits`	Number of iterations actually performed in phase 3.
`thetaUsed`	If `thetaValues` is not `NULL`, the matrix of parameter values actually used in the simulations of Phase 3.

Writes text output to the file named "projname.txt", where projname is defined in the sienaAlgorithm object x.

Author(s)

Ruth Ripley, Tom Snijders, Viviana Amati, Felix Schoenenberger

References

Amati, Viviana, Schoenenberger, Felix, and Snijders, Tom A. B. (2015). Estimation of stochastic actor-oriented models for the evolution of networks by generalized method of moments. Journal de la Societe Francaise de Statistique 156, 140-165.
Amati, Viviana, Schoenenberger, Felix, and Snijders, Tom A. B. (2019). Contemporaneous statistics for estimation in stochastic actor-oriented co-evolution models. Psychometrika 84, 1068-1096.
Schweinberger, Michael, and Snijders, Tom A.B. (2007). Markov models for digraph panel data: Monte Carlo-based derivative estimation. Computational Statistics and Data Analysis 51, 4465–4483.
Snijders, Tom A.B. (2001). The statistical evaluation of social network dynamics. Sociological Methodology 31, 361-395.
Snijders, Tom A.B. (2017). Stochastic Actor-Oriented Models for Network Dynamics. Annual Review of Statistics and Its Application 4, 343–363.
Snijders, Tom A. B., Koskinen, Johan, and Schweinberger, Michael (2010). Maximum likelihood estimation for social network dynamics. Annals of Applied Statistics 4, 567–588.
Snijders, Tom A.B., Steglich, Christian E.G., and Schweinberger, Michael (2007). Modeling the co-evolution of networks and behavior. Pp. 41–71 in Longitudinal models in the behavioral and related sciences, edited by Kees van Montfort, Han Oud and Albert Satorra; Lawrence Erlbaum.
Steglich, Christian E. G., Snijders, Tom A. B., and Pearson, Michael A. (2010). Dynamic networks and behavior: Separating selection from influence. Sociological Methodology 40, 329–393.
Information about the implementation of the algorithm is in http://www.stats.ox.ac.uk/~snijders/siena/Siena_algorithms.pdf.
Further see http://www.stats.ox.ac.uk/~snijders/siena/ .

Examples

myalgorithm <- sienaAlgorithmCreate(nsub=2, n3=100, seed=1293)
# nsub=2, n3=100 is used here for having a brief computation, not for practice.
mynet1 <- sienaDependent(array(c(tmp3, tmp4), dim=c(32, 32, 2)))
mydata <- sienaDataCreate(mynet1)
myeff <- getEffects(mydata)
ans <- siena07(myalgorithm, data=mydata, effects=myeff, batch=TRUE)

# or for non-conditional estimation --------------------------------------------
## Not run: 
model <- sienaAlgorithmCreate(nsub=2, n3=100, cond=FALSE, seed=1283)
ans <- siena07(myalgorithm, data=mydata, effects=myeff, batch=TRUE)
        
## End(Not run)

# or if a previous "on track" result ans was obtained --------------------------
## Not run: 
ans1 <- siena07(myalgorithm, data=mydata, effects=myeff, prevAns=ans)
         
## End(Not run)

# Running in multiple processors -----------------------------------------------
## Not run: 
# Not tested because dependent on presence of processors
# Find out how many processors there are
library(parallel)
(n.clus <- detectCores() - 1)
# number of cores; 1 is subtracted to keep time for other processes
ans2 <- siena07(myalgorithm, data=mydata, effects=myeff, batch=TRUE,
                useCluster=TRUE, nbrNodes=n.clus, initC=TRUE)

# Suppose 8 processors are going to be used.
# Loading the parallel package and creating a cluster
# with 8 processors (this should be equivalent)

library(parallel)
cl <- makeCluster(n.clus)

ans3 <- siena07(myalgorithm, data=mydata, effects=myeff, batch=TRUE, cl = cl)

# Notice that now -siena07- perhaps won't stop the cluster for you.
# stopCluster(cl)

# You can create even more complex clusters using several computers. In this
# example we are creating a cluster with 3*8 = 24 processors on three
# different machines.
cl <- makePSOCKcluster(
    rep(c('localhost', 'machine2.website.com' , 'machine3.website.com'), 8),
    user='myusername', rshcmd='ssh -p PORTNUMBER')

ans4 <- siena07(myalgorithm, data=mydata, effects=myeff, batch=TRUE, cl = cl)
stopCluster(cl)

## End(Not run)
# Accessing simulated networks for ML ------------------------------------------
# The following is an example for accessing the simulated networks for ML,
# which makes sense only if there are some missing tie variables;
# observed tie variables are identically simulated
# at the moment of observation,
# missing tie variable are imputed in a model-based way.
mat1 <- matrix(c(0,0,1,1,
                 1,0,0,0,
                 0,0,0,1,
                 0,1,0,0),4,4, byrow=TRUE)
mat2 <- matrix(c(0,1,1,1,
                 1,0,0,0,
                 0,0,0,1,
                 0,0,1,0),4,4, byrow=TRUE)
mat3 <- matrix(c(0,1,0,1,
                 1,0,0,0,
                 0,0,0,0,
                 NA,1,1,0),4,4, byrow=TRUE)
mats <- array(c(mat1,mat2,mat3), dim=c(4,4,3))
net <- sienaDependent(mats, allowOnly=FALSE)
sdat <- sienaDataCreate(net)
alg <- sienaAlgorithmCreate(maxlike=TRUE, nsub=3, n3=100, seed=12534)
effs <- getEffects(sdat)
(ans <- siena07(alg, data=sdat, effects=effs, returnDeps=TRUE, batch=TRUE))
# See manual Section 9.1 for information about the following functions
edges.to.adj <- function(x,n){
# create empty adjacency matrix
    adj <- matrix(0, n, n)
# put edge values in desired places
    adj[x[, 1:2]] <- x[, 3]
    adj
}
the.edge <- function(x,n,h,k){
    edges.to.adj(x,n)[h,k]
}
# Now show the results
n <- 4
ego <- rep.int(1:n,n)
alter <- rep(1:n, each=n)
# Get the average simulated adjacency matrices for wave 3 (period 2):
ones <- sapply(1:n^2, function(i)
    {mean(sapply(ans$sims,
           function(x){the.edge(x[[1]][[2]][[1]],n,ego[i],alter[i])}))})
# Note that for maximum likelihood estimation,
# if there is one group and one period,
# the nesting levels for group and period are dropped from ans$sims.
cbind(ego,alter,ones)
matrix(ones,n,n)

RSiena

Siena - Simulation Investigation for Empirical Network Analysis

v1.3.0

GPL-2 | GPL-3 | file LICENSE

Authors

Tom A.B. Snijders [cre, aut] (<https://orcid.org/0000-0003-3157-4157>), Ruth Ripley [aut], Christian Steglich [aut, ctb] (<https://orcid.org/0000-0002-9097-0873>), Johan Koskinen [aut, ctb] (<https://orcid.org/0000-0002-6860-325X>), Nynke Niezink [aut, ctb], Viviana Amati [aut, ctb] (<https://orcid.org/0000-0003-1190-1237>), Christoph Stadtfeld [ctb] (<https://orcid.org/0000-0002-2704-2134>), James Hollway [ctb] (IHEID, <https://orcid.org/0000-0002-8361-9647>), Per Block [ctb], Robert Krause [ctb], Charlotte Greenan [ctb], Josh Lospinoso [ctb], Michael Schweinberger [ctb] (<https://orcid.org/0000-0003-3649-5386>), Mark Huisman [ctb], Krists Boitmanis [ctr], Felix Schoenenberger [ctb], Mark Ortmann [ctb], Marion Hoffman [ctb], Robert Hellpap [ctb], Alvaro Uzaheta [ctb]

Initial release

2021-05-02