Hauck-Donner Effects: A Detection Test for Wald Tests
A detection test for the Hauck-Donner effect on each regression coefficient of a VGLM regression or 2 x 2 table.
hdeff(object, ...) hdeff.vglm(object, derivative = NULL, se.arg = FALSE, subset = NULL, hstep = 0.005, fd.only = FALSE, ...) hdeff.numeric(object, byrow = FALSE, ...) hdeff.matrix(object, ...)
object |
Usually a Alternatively Another alternative is that |
derivative |
Numeric. Either 1 or 2.
Currently only a few models having one linear predictor are handled
analytically for |
se.arg |
Logical. If |
subset |
Logical or vector of indices,
to select the regression coefficients of interest.
The default is to select all coefficients.
Recycled if necessary if logical.
If numeric then they should comprise
elements from |
hstep |
Positive numeric and recycled to length 2;
it is the so-called step size when using
finite-differences and is often called h in the calculus
literature,
e.g., f'(x) is approximately (f(x+h) - f(x)) / h.
For the 2nd-order partial derivatives, there are two step sizes
and hence this argument is recycled to length 2.
The default is to have the same values.
The 1st-order derivatives use the first value only.
It is recommended that a few values of this argument be tried
because values of the first and second derivatives can
vary accordingly.
If any values are too large then the derivatives may be inaccurate;
and if too small then the derivatives may be unstable and
subject to too much round-off/cancellation error
(in fact it may create an error or a |
fd.only |
Logical;
if |
byrow |
Logical;
fed into |
... |
currently unused but may be used in the future for further arguments passed into the other methods functions. |
Almost all of statistical inference based on the likelihood assumes that the parameter estimates are located in the interior of the parameter space. The nonregular case of being located on the boundary is not considered very much and leads to very different results from the regular case. Practically, an important question is: how close is close to the boundary? One might answer this as: the parameter estimates are too close to the boundary when the Hauck-Donner effect (HDE) is present, whereby the Wald statistic becomes aberrant.
Hauck and Donner (1977) first observed an aberration of the Wald test
statistic not monotonically increasing as a function of increasing
distance between the parameter estimate and the null value. This
"disturbing" and "undesirable" underappreciated effect has since been
observed in other regression models by various authors. This function
computes the first, and possibly second, derivative of the Wald
statistic for each regression coefficient. A negative value of the
first derivative is indicative of the HDE being present.
More information can be obtained from hdeffsev
regarding HDE severity: there may be none,
faint, weak, moderate, strong and extreme amounts of HDE
present.
In general, most models have derivatives that are computed
numerically using finite-difference
approximations. The reason is that it takes a lot of work
to program in the analytical solution
(this includes a few very common models, such as
poissonff
and
binomialff
,
where the first two derivatives have been implemented).
By default this function returns a labelled logical vector;
a TRUE
means the HDE is affirmative for that coefficient
(negative slope).
Hence ideally all values are FALSE
.
Any TRUE
values suggests that the MLE is
too near the boundary of the parameter space,
and that the p-value for that regression coefficient
is biased upwards.
When present
a highly significant variable might be deemed nonsignificant,
and thus the HDE can create havoc for variable selection.
If the HDE is present then more accurate
p-values can generally be obtained by conducting a
likelihood ratio test
(see lrt.stat.vlm
)
or Rao's score test
(see score.stat.vlm
);
indeed the default of
wald.stat.vlm
does not suffer from the HDE.
Setting deriv = 1
returns a numerical vector of first
derivatives of the Wald statistics.
Setting deriv = 2
returns a 2-column matrix of first
and second derivatives of the Wald statistics.
Then
setting se.arg = TRUE
returns an additional 1 or 2 columns.
Some 2nd derivatives are NA
if
only a partial analytic solution has been programmed in.
For those VGAM family functions whose HDE test has not yet
been implemented explicitly (the vast majority of them),
finite-difference approximations
to the derivatives will be used—see the arguments
hstep
and fd.only
for getting some control on them.
The function summaryvglm
conducts the HDE
detection test if possible and prints out a line at the bottom
if the HDE is detected for some regression coefficients.
By “if possible”, only a few family functions are exempt and they
have an infos
slot with component hadof = FALSE
;
such as
normal.vcm
,
rec.normal
because it
uses the BFGS-IRLS method for computing the working weights.
For these few a NULL
is returned by hdeff
.
If the second derivatives are of interest then
it is recommended that crit = "c"
be added to the
fitting so that a slightly more accurate model results
(usually one more IRLS iteration).
This is because the FD approximation is very sensitive to
values of the working weights, so they need to be computed
accurately.
Occasionally, if the coefficient is close to 0,
then its Wald statistic's
second derivative may be unusually large in magnitude
(this could be due to something such as roundoff error).
This function is currently under development
and may change a little in the short future.
For HDE severity measures see hdeffsev
.
Yee (2018) gives details about HDE detection for the entire VGLM class, and proves a tipping point theorem with tipping points 1/4 and 3/5. The HDE severity measures allow partitioning of the parameter space into 6 regions from the interior and going outwards towards the boundary edges. It is also shown that with 1-parameter binary regression the HDE cannot occur unless the log odds ratio is at least 2.40, which corresponds to an odds ratio of 11.0 or more.
Thomas W. Yee.
Hauck, J. W. W. and A. Donner (1977). Wald's test as applied to hypotheses in logit analysis. Journal of the American Statistical Association, 72, 851–853. Corrigenda: JASA, 75, 482.
Yee, T. W. (2018) On the Hauck-Donner effect in Wald tests: Detection, and parameter space characterization (submitted for publication).
pneumo <- transform(pneumo, let = log(exposure.time)) fit <- vglm(cbind(normal, mild, severe) ~ let, data = pneumo, trace = TRUE, crit = "c", # Get some more accuracy cumulative(reverse = TRUE, parallel = TRUE)) cumulative()@infos()$hadof # Analytical solution implemented hdeff(fit) hdeff(fit, deriv = 1) # Analytical solution hdeff(fit, deriv = 2) # It is a partial analytical solution hdeff(fit, deriv = 2, se.arg = TRUE, fd.only = TRUE) # All derivatives solved numerically by FDs # 2 x 2 table of counts R0 <- 25; N0 <- 100 # Hauck Donner (1977) data set mymat <- c(N0-R0, R0, 8, 92) # HDE present (mymat <- matrix(mymat, 2, 2, byrow = TRUE)) hdeff(mymat) hdeff(c(mymat)) # Input is a vector hdeff(c(t(mymat)), byrow = TRUE) # Reordering of the data
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.