Newton- and Quasi-Newton Maximization
Unconstrained and equality-constrained maximization based on the quadratic approximation (Newton) method. The Newton-Raphson, BFGS (Broyden 1970, Fletcher 1970, Goldfarb 1970, Shanno 1970), and BHHH (Berndt, Hall, Hall, Hausman 1974) methods are available.
maxNR(fn, grad = NULL, hess = NULL, start, constraints = NULL, finalHessian = TRUE, bhhhHessian=FALSE, fixed = NULL, activePar = NULL, control=NULL, ... ) maxBFGSR(fn, grad = NULL, hess = NULL, start, constraints = NULL, finalHessian = TRUE, fixed = NULL, activePar = NULL, control=NULL, ... ) maxBHHH(fn, grad = NULL, hess = NULL, start, finalHessian = "BHHH", ... )
fn |
the function to be maximized.
It must have the parameter vector as the first argument and
it must return either a single number, or a numeric vector (this is
is summed internally).
If the BHHH method is used and argument
|
grad |
gradient of the objective function.
It must have the parameter vector as the first argument and
it must return either a gradient vector of the objective function,
or a matrix, where columns correspond to individual parameters.
The column sums are treated as gradient components.
If |
hess |
Hessian matrix of the function.
It must have the parameter vector as the first argument and
it must return the Hessian matrix of the objective function.
If missing, finite-difference Hessian, based on |
start |
initial parameter values. If start values are named, those names are also carried over to the results. |
constraints |
either |
finalHessian |
how (and if) to calculate the final Hessian. Either
|
bhhhHessian |
logical. Indicating whether to use the information
equality approximation (Bernd, Hall, Hall, and Hausman, 1974) for
the Hessian. This effectively transforms |
fixed |
parameters to be treated as constants at their
|
activePar |
this argument is retained for backward compatibility only;
please use argument |
control |
list of control parameters. The control parameters used by these optimizers are
|
... |
further arguments to |
The idea of the Newton method is to approximate the function at a given location by a multidimensional quadratic function, and use the estimated maximum as the start value for the next iteration. Such an approximation requires knowledge of both gradient and Hessian, the latter of which can be quite costly to compute. Several methods for approximating Hessian exist, including BFGS and BHHH.
The BHHH (information equality) approximation is only valid for
log-likelihood functions.
It requires the score (gradient) values by individual observations and hence
those must be returned
by individual observations by grad
or fn
.
The Hessian is approximated as the negative of the sum of the outer products
of the gradients of individual observations, or, in the matrix form,
\code{H = -t(gradient) %*% gradient = - crossprod( gradient )}.
The functions maxNR
, maxBFGSR
, and maxBHHH
can work with constant parameters, useful if a parameter value
converges to the boundary of support, or for testing.
One way is to put
fixed
to non-NULL, specifying which parameters should be treated as
constants. The
parameters can also be fixed in runtime (only for maxNR
and maxBHHH
) by
signaling it with the
fn
return value. See Henningsen & Toomet (2011) for details.
object of class "maxim". Data can be extracted through the following methods:
|
|
|
estimated parameter value. |
gradient |
vector, last calculated gradient value. Should be close to 0 in case of normal convergence. |
estfun |
matrix of gradients at parameter value |
hessian |
Hessian at the maximum (the last calculated value if not converged). |
returnCode |
return code:
|
returnMessage |
a short message, describing the return code. |
activePar |
logical vector, which parameters are optimized over.
Contains only |
nIter |
number of iterations. |
maximType |
character string, type of maximization. |
maxControl |
the optimization control parameters in the form of a
|
The following components can only be extracted directly (with \$
):
last.step |
a list describing the last unsuccessful step if
|
constraints |
A list, describing the constrained optimization
(
|
No attempt is made to ensure that user-provided analytic
gradient/Hessian is correct. The users are
encouraged to use compareDerivatives
function,
designed for this purpose. If analytic gradient/Hessian are wrong,
the algorithm may not converge, or may converge to a wrong point.
As the BHHH method uses the likelihood-specific information equality, it is only suitable for maximizing log-likelihood functions!
Quasi-Newton methods, including those mentioned above, do not work
well in non-concave regions. This is especially the case with the
implementation in maxBFGSR
. The user is advised to
experiment with various tolerance options to achieve convergence.
Ott Toomet, Arne Henningsen,
function maxBFGSR
was originally developed by Yves Croissant
(and placed in 'mlogit' package)
Berndt, E., Hall, B., Hall, R. and Hausman, J. (1974): Estimation and Inference in Nonlinear Structural Models, Annals of Social Measurement 3, 653–665.
Broyden, C.G. (1970): The Convergence of a Class of Double-rank Minimization Algorithms, Journal of the Institute of Mathematics and Its Applications 6, 76–90.
Fletcher, R. (1970): A New Approach to Variable Metric Algorithms, Computer Journal 13, 317–322.
Goldfarb, D. (1970): A Family of Variable Metric Updates Derived by Variational Means, Mathematics of Computation 24, 23–26.
Henningsen, A. and Toomet, O. (2011): maxLik: A package for maximum likelihood estimation in R Computational Statistics 26, 443–458
Marquardt, D.W., (1963) An Algorithm for Least-Squares Estimation of Nonlinear Parameters, Journal of the Society for Industrial & Applied Mathematics 11, 2, 431–441
Shanno, D.F. (1970): Conditioning of Quasi-Newton Methods for Function Minimization, Mathematics of Computation 24, 647–656.
maxLik
for a general framework for maximum likelihood
estimation (MLE);
maxBHHH
for maximizations using the Berndt, Hall, Hall,
Hausman (1974) algorithm (which is a wrapper function to maxNR
);
maxBFGS
for maximization using the BFGS, Nelder-Mead (NM),
and Simulated Annealing (SANN) method (based on optim
),
also supporting inequality constraints;
nlm
for Newton-Raphson optimization; and
optim
for different gradient-based optimization
methods.
## Fit exponential distribution by ML t <- rexp(100, 2) # create data with parameter 2 loglik <- function(theta) sum(log(theta) - theta*t) ## Note the log-likelihood and gradient are summed over observations gradlik <- function(theta) sum(1/theta - t) hesslik <- function(theta) -100/theta^2 ## Estimate with finite-difference gradient and Hessian a <- maxNR(loglik, start=1, control=list(printLevel=2)) summary(a) ## You would probably prefer 1/mean(t) instead ;-) ## The same example with analytic gradient and Hessian a <- maxNR(loglik, gradlik, hesslik, start=1) summary(a) ## BFGS estimation with finite-difference gradient a <- maxBFGSR( loglik, start=1 ) summary(a) ## For the BHHH method we need likelihood values and gradients ## of individual observations, not the sum of those loglikInd <- function(theta) log(theta) - theta*t gradlikInd <- function(theta) 1/theta - t ## Estimate with analytic gradient a <- maxBHHH(loglikInd, gradlikInd, start=1) summary(a) ## Example with a vector argument: Estimate the mean and ## variance of a random normal sample by maximum likelihood ## Note: you might want to use maxLik instead loglik <- function(param) { # param is a 2-vector of c(mean, sd) mu <- param[1] sigma <- param[2] ll <- -0.5*N*log(2*pi) - N*log(sigma) - sum(0.5*(x - mu)^2/sigma^2) ll } x <- rnorm(100, 1, 2) # use mean=1, sd=2 N <- length(x) res <- maxNR(loglik, start=c(0,1)) # use 'wrong' start values summary(res) ## The previous example with named parameters and a fixed value resFix <- maxNR(loglik, start=c(mu=0, sigma=1), fixed="sigma") summary(resFix) # 'sigma' is exactly 1.000 now. ### Constrained optimization ### ## We maximize exp(-x^2 - y^2) where x+y = 1 hatf <- function(theta) { x <- theta[1] y <- theta[2] exp(-(x^2 + y^2)) ## Note: you may prefer exp(- theta %*% theta) instead } ## use constraints: x + y = 1 A <- matrix(c(1, 1), 1, 2) B <- -1 res <- maxNR(hatf, start=c(0,0), constraints=list(eqA=A, eqB=B), control=list(printLevel=1)) print(summary(res))
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.