Simulated Tempering and Umbrella Sampling
Markov chain Monte Carlo (MCMC) for continuous random vectors using
parallel or serial tempering, the latter also called umbrella
sampling and simulated tempering.
The chain simulates k different distributions on the
same state space. In parallel tempering, all the distributions are
simulated in each iteration. In serial tempering, only one of the
distributions is simulated (a random one). In parallel tempering,
the state is a k by p matrix, each row of which
is the state for one of the distributions.
In serial tempering the state of the Markov chain is a pair (i, x),
where i is an integer between 1 and k and x is a vector
of length p. This pair is represented as a single real vector
c(i, x)
. The variable i says which distribution x
is a simulation for.
temper(obj, initial, neighbors, nbatch, blen = 1, nspac = 1, scale = 1, outfun, debug = FALSE, parallel = FALSE, ...) ## S3 method for class 'function' temper(obj, initial, neighbors, nbatch, blen = 1, nspac = 1, scale = 1, outfun, debug = FALSE, parallel = FALSE, ...) ## S3 method for class 'tempering' temper(obj, initial, neighbors, nbatch, blen = 1, nspac = 1, scale = 1, outfun, debug = FALSE, parallel = FALSE, ...)
obj |
either an R function or an object of class If a function, it should evaluate the log unnormalized density log h(i, x) of the desired equilibrium distribution of the Markov chain for serial tempering (the same function is used for both serial and parallel tempering, see Details below for further explanation). If an object of class The first argument of the log unnormalized density function is the
is an R vector |
initial |
for serial tempering, a real vector |
neighbors |
a logical symmetric matrix of dimension |
nbatch |
the number of batches. |
blen |
the length of batches. |
nspac |
the spacing of iterations that contribute to batches. |
scale |
controls the proposal step size for real elements of the state
vector. For serial tempering, proposing a new value for the x
part of the state (i, x). For parallel tempering, proposing
a new value for the x[i] part of the state
(x[1], …, x[k]). In either case, the proposal
is a real vector of length p. If scalar or vector, the proposal
is |
outfun |
controls the output. If a function, then the batch means
of |
debug |
if |
parallel |
if |
... |
additional arguments for |
Serial tempering simulates a mixture of distributions of a continuous random
vector. The number of components of the mixture is k
, and the dimension
of the random vector is p
. Denote the state (i, x), where i
is a positive integer between 1 and k, and let h(i, x) denote
the unnormalized joint density of their equilibrium distribution.
The logarithm of this function is what obj
or obj$lud
calculates.
The mixture distribution is the marginal for x derived from
the equilibrium distribution h(i, x), that is,
h(x) = sum[i = 1 to k] h(i, x)
Parallel tempering simulates a product of distributions of a continuous random vector. Denote the state (x[1], …, x[k]), then the unnormalized joint density of the equilibrium distribution is
h(x[1], …, x[k]) = prod[i = 1 to k] h(i, x[i])
The update mechanism of the Markov chain combines two kinds of elementary updates: jump/swap updates (jump for serial tempering, swap for parallel tempering) and within-component updates. Each iteration of the Markov chain one of these elementary updates is done. With probability 1/2 a jump/swap update is done, and with probability 1/2 a with-component update is done.
Within-component updates are the same for both serial and parallel tempering.
They are “random-walk” Metropolis updates with multivariate normal
proposal, the proposal distribution being determined by the argument
scale
. In serial tempering, the x part of the current state
(i, x) is updated preserving h(i, x).
In parallel tempering, an index i is chosen at random and the part
of the state x[i] representing that component is updated,
again preserving h(i, x).
Jump updates choose uniformly at random a neighbor of the current component:
if i indexes the current component, then it chooses uniformly at random
a j such that neighbors[i, j] == TRUE
. It then does does a
Metropolis-Hastings update for changing the current state from (i, x)
to (j, x).
Swap updates choose a component uniformly at random and a neighbor of that
component uniformly at random: first an index i is chosen uniformly
at random between 1 and k, then an index j is chosen
uniformly at random such that neighbors[i, j] == TRUE
. It then does
does a Metropolis-Hastings update for swapping the states of the
two components: interchanging x[i, ] and x[j, ]
while preserving h(x[1], …, x[k]).
The initial state must satisfy lud(initial, ...) > - Inf
for serial
tempering or must satisfy lud(initial[i, ], ...) > - Inf
for each
i
for parallel tempering, where lud
is either obj
or obj$lud
.
That is, the initial state must have positive probability.
an object of class "mcmc"
, subclass "tempering"
,
which is a list containing at least the following components:
batch |
the batch means of the continuous part of the state.
If |
ibatch |
(returned for serial tempering only) an |
acceptx |
fraction of Metropolis within-component proposals accepted.
A vector of length |
accepti |
fraction of Metropolis jump/swap proposals accepted.
A |
initial |
value of argument |
final |
final state of Markov chain. |
initial.seed |
value of |
final.seed |
value of |
time |
running time of Markov chain from |
lud |
the function used to calculate log unnormalized density,
either |
nbatch |
the argument |
blen |
the argument |
nspac |
the argument |
outfun |
the argument |
Description of additional output when debug = TRUE
can be
found in the vignette debug
, which is shown by
vignette("debug", "mcmc")
.
If outfun
is missing, then the log unnormalized
density function can be defined without a ... argument and that works fine.
One can define it starting ludfun <- function(state)
and that works
or ludfun <- function(state, foo, bar)
, where foo
and bar
are supplied as additional arguments to temper
and that works too.
If outfun
is a function, then both it and the log unnormalized
density function can be defined without ... arguments if they
have exactly the same arguments list and that works fine. Otherwise it
doesn't work. Define these functions by
ludfun <- function(state, foo) outfun <- function(state, bar)
and you get an error about unused arguments. Instead define these functions by
ludfun <- function(state, foo, \ldots) outfun <- function(state, bar, \ldots)
and supply foo
and bar
as additional arguments to temper
,
and that works fine.
In short, the log unnormalized density function and outfun
need
to have ... in their arguments list to be safe. Sometimes it works
when ... is left out and sometimes it doesn't.
Of course, one can avoid this whole issue by always defining the log
unnormalized density function and outfun
to have only one argument
state
and use global variables (objects in the R global environment) to
specify any other information these functions need to use. That too
follows the R way. But some people consider that bad programming practice.
A third option is to define either or both of these functions using a function
factory. This is demonstrated in the vignette for this package named
demo
, which is shown by vignette("demo", "mcmc")
.
This function follows the philosophy of MCMC explained
the introductory chapter of the
Handbook of Markov Chain Monte Carlo (Geyer, 2011a)
and in the chapter of that book on tempering and related subjects
(Geyer, 2011b).
See also the section on philosophy of metrop
.
The scale
argument must be adjusted so that the acceptance rate
for within-component proposals (component acceptx
of the result
returned by this function)
is not too low or too high to get reasonable performance.
The log unnormalized density function must be chosen so that the acceptance rate
for jump/swap proposals (component accepti
of the result
returned by this function)
is not too low or too high to get reasonable performance.
The former is a vector and the latter is a matrix, and
all these rates must be adjusted to be reasonable.
The rates in in accepti
are influenced by the number of components
of the tempering mixture distribution, by what those components are (how
far apart they are in some unspecified metric on probability distributions),
and by the chosen normalizing constants for those distributions.
For examples of tuning tempering, see Geyer (2011b) and also the vignette
of this package shown by vignette("bfst", "mcmc")
.
The help for R function metrop
also gives advice on tuning
its sampler, which is relevant for tuning the acceptx
rates.
See also Geyer (1991) and Geyer and Thompson (1995) for the general theory of tuning parallel and serial tempering.
Geyer, C. J. (1991) Markov chain Monte Carlo maximum likelihood. Computing Science and Statistics: Proc. 23rd Symp. Interface, 156–163. http://hdl.handle.net/11299/58440.
Geyer, C. J. (2011a) Introduction to MCMC. In Handbook of Markov Chain Monte Carlo. Edited by S. P. Brooks, A. E. Gelman, G. L. Jones, and X. L. Meng. Chapman & Hall/CRC, Boca Raton, FL, pp. 3–48.
Geyer, C. J. (2011b) Importance Sampling, Simulated Tempering, and Umbrella Sampling. In Handbook of Markov Chain Monte Carlo. Edited by S. P. Brooks, A. E. Gelman, G. L. Jones, and X. L. Meng. Chapman & Hall/CRC, Boca Raton, FL, pp. 295–312.
Geyer, C. J. and Thompson, E. A. (1995) Annealing Markov chain Monte Carlo with applications to ancestral inference. Journal of the American Statistical Association 90 909–920.
d <- 9 witch.which <- c(0.1, 0.3, 0.5, 0.7, 1.0) ncomp <- length(witch.which) neighbors <- matrix(FALSE, ncomp, ncomp) neighbors[row(neighbors) == col(neighbors) + 1] <- TRUE neighbors[row(neighbors) == col(neighbors) - 1] <- TRUE ludfun <- function(state, log.pseudo.prior = rep(0, ncomp)) { stopifnot(is.numeric(state)) stopifnot(length(state) == d + 1) icomp <- state[1] stopifnot(icomp == as.integer(icomp)) stopifnot(1 <= icomp && icomp <= ncomp) stopifnot(is.numeric(log.pseudo.prior)) stopifnot(length(log.pseudo.prior) == ncomp) theta <- state[-1] if (any(theta > 1.0)) return(-Inf) bnd <- witch.which[icomp] lpp <- log.pseudo.prior[icomp] if (any(theta > bnd)) return(lpp) return(- d * log(bnd) + lpp) } # parallel tempering thetas <- matrix(0.5, ncomp, d) out <- temper(ludfun, initial = thetas, neighbors = neighbors, nbatch = 20, blen = 10, nspac = 5, scale = 0.56789, parallel = TRUE, debug = TRUE) # serial tempering theta.initial <- c(1, rep(0.5, d)) # log pseudo prior found by trial and error qux <- c(0, 9.179, 13.73, 16.71, 20.56) out <- temper(ludfun, initial = theta.initial, neighbors = neighbors, nbatch = 50, blen = 30, nspac = 2, scale = 0.56789, parallel = FALSE, debug = FALSE, log.pseudo.prior = qux)
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.