Maximum pseudolikelihood estimation in complex surveys
Maximises a user-specified likelihood parametrised by multiple linear predictors to data from a complex sample survey and computes the sandwich variance estimator of the coefficients. Note that this function maximises an estimated population likelihood, it is not the sample MLE.
svymle(loglike, gradient = NULL, design, formulas, start = NULL, control = list(), na.action="na.fail", method=NULL, lower=NULL,upper=NULL,influence=FALSE,...) ## S3 method for class 'svymle' summary(object, stderr=c("robust", "model"),...)
loglike |
vectorised loglikelihood function |
gradient |
Derivative of |
design |
a |
formulas |
A list of formulas specifying the variable and linear predictors: see Details below |
start |
Starting values for parameters |
control |
control options for the optimiser: see the help page for the optimiser you are using. |
lower,upper |
Parameter bounds for |
influence |
Return the influence functions (primarily for svyby) |
na.action |
Handling of |
method |
|
... |
Arguments to |
object |
|
stderr |
Choice of standard error estimator. The default is a standard sandwich estimator. See Details below. |
The design
object contains all the data and design information
from the survey, so all the formulas refer to variables in this object.
The formulas
argument needs to specify the response variable and
a linear predictor for each freely varying argument of loglike
.
Consider for example the dnorm
function, with arguments
x
, mean
, sd
and log
, and suppose we want to
estimate the mean of y
as a linear function of a variable
z
, and to estimate a constant standard deviation. The log
argument must be fixed at FALSE
to get the loglikelihood. A
formulas
argument would be list(~y, mean=~z, sd=~1)
. Note
that the data variable y
must be the first argument to
dnorm
and the first formula and that all the other formulas are
labelled. It is also permitted to have the data variable as the
left-hand side of one of the formulas: eg list( mean=y~z, sd=~1)
.
The two optimisers from the minqa
package do not use any
derivatives to be specified for optimisation, but they do assume
that the function is smooth enough for a quadratic approximation, ie,
that two derivatives exist.
The usual variance estimator for MLEs in a survey sample is a ‘sandwich’
variance that requires the score vector and the information matrix. It
requires only sampling assumptions to be valid (though some model
assumptions are required for it to be useful). This is the
stderr="robust"
option, which is available only when the gradient
argument was specified.
If the model is correctly specified and the sampling is at random
conditional on variables in the model then standard errors based on just
the information matrix will be approximately valid. In particular, for
independent sampling where weights and strata depend on variables in the
model the stderr="model"
should work fairly well.
An object of class svymle
Thomas Lumley
data(api) dstrat<-svydesign(id=~1, strata=~stype, weight=~pw, fpc=~fpc, data=apistrat) ## fit with glm m0 <- svyglm(api00~api99+ell,family="gaussian",design=dstrat) ## fit as mle (without gradient) m1 <- svymle(loglike=dnorm,gradient=NULL, design=dstrat, formulas=list(mean=api00~api99+ell, sd=~1), start=list(c(80,1,0),c(20)), log=TRUE) ## with gradient gr<- function(x,mean,sd,log){ dm<-2*(x - mean)/(2*sd^2) ds<-(x-mean)^2*(2*(2 * sd))/(2*sd^2)^2 - sqrt(2*pi)/(sd*sqrt(2*pi)) cbind(dm,ds) } m2 <- svymle(loglike=dnorm,gradient=gr, design=dstrat, formulas=list(mean=api00~api99+ell, sd=~1), start=list(c(80,1,0),c(20)), log=TRUE, method="BFGS") summary(m0) summary(m1,stderr="model") summary(m2) ## Using offsets m3 <- svymle(loglike=dnorm,gradient=gr, design=dstrat, formulas=list(mean=api00~api99+offset(ell)+ell, sd=~1), start=list(c(80,1,0),c(20)), log=TRUE, method="BFGS") ## demonstrating multiple linear predictors m3 <- svymle(loglike=dnorm,gradient=gr, design=dstrat, formulas=list(mean=api00~api99+offset(ell)+ell, sd=~stype), start=list(c(80,1,0),c(20,0,0)), log=TRUE, method="BFGS") ## More complicated censored lognormal data example ## showing that the response variable can be multivariate data(pbc, package="survival") pbc$randomized <- with(pbc, !is.na(trt) & trt>0) biasmodel<-glm(randomized~age*edema,data=pbc) pbc$randprob<-fitted(biasmodel) dpbc<-svydesign(id=~1, prob=~randprob, strata=~edema, data=subset(pbc,randomized)) ## censored logNormal likelihood lcens<-function(x,mean,sd){ ifelse(x[,2]==1, dnorm(log(x[,1]),mean,sd,log=TRUE), pnorm(log(x[,1]),mean,sd,log=TRUE,lower.tail=FALSE) ) } gcens<- function(x,mean,sd){ dz<- -dnorm(log(x[,1]),mean,sd)/pnorm(log(x[,1]),mean,sd,lower.tail=FALSE) dm<-ifelse(x[,2]==1, 2*(log(x[,1]) - mean)/(2*sd^2), dz*-1/sd) ds<-ifelse(x[,2]==1, (log(x[,1])-mean)^2*(2*(2 * sd))/(2*sd^2)^2 - sqrt(2*pi)/(sd*sqrt(2*pi)), ds<- dz*-(log(x[,1])-mean)/(sd*sd)) cbind(dm,ds) } m<-svymle(loglike=lcens, gradient=gcens, design=dpbc, method="newuoa", formulas=list(mean=I(cbind(time,status>0))~bili+protime+albumin, sd=~1), start=list(c(10,0,0,0),c(1))) summary(m) ## the same model, but now specifying the lower bound of zero on the ## log standard deviation mbox<-svymle(loglike=lcens, gradient=gcens, design=dpbc, method="bobyqa", formulas=list(mean=I(cbind(time,status>0))~bili+protime+albumin, sd=~1), lower=list(c(-Inf,-Inf,-Inf,-Inf),0), upper=Inf, start=list(c(10,0,0,0),c(1))) ## The censored lognormal model is now available in svysurvreg() summary(svysurvreg(Surv(time,status>0)~bili+protime+albumin, design=dpbc,dist="lognormal")) ## compare svymle scale value after log transformation svycontrast(m, quote(log(`sd.(Intercept)`)))
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.