Tweedie Distributions: mle estimation of p
Maximum likelihood estimation of the Tweedie index parameter power.
tweedie.profile(formula, p.vec=NULL, xi.vec=NULL, link.power=0, data, weights, offset, fit.glm=FALSE, do.smooth=TRUE, do.plot=FALSE, do.ci=do.smooth, eps=1/6, control=list( epsilon=1e-09, maxit=glm.control()$maxit, trace=glm.control()$trace ), do.points=do.plot, method="inversion", conf.level=0.95, phi.method=ifelse(method == "saddlepoint", "saddlepoint", "mle"), verbose=FALSE, add0=FALSE)
formula |
a formula expression as for other regression models and generalized linear models,
of the form |
p.vec |
a vector of |
xi.vec |
the same as |
link.power |
the power link function to use.
These link functions g() are of the form
g(eta) = eta^link.power,
and the special case of |
data |
an optional data frame, list or environment
(or object coercible by |
weights |
an optional vector of weights to be used in the fitting
process. Should be |
offset |
this can be used to specify an a priori
known component to be included in the linear predictor during fitting.
This should be |
fit.glm |
logical flag.
If |
do.smooth |
logical flag.
If |
do.plot |
logical flag.
If |
do.ci |
logical flag.
If |
eps |
the offset in computing the variance function.
The default is |
control |
a list of parameters for controlling the fitting process;
see |
do.points |
plot the points on the plot where the
(log-) likelihood is computed for the given values of |
method |
the method for computing the (log-) likelihood.
One of
|
conf.level |
the confidence level for the computation of the nominal
confidence interval.
The default is |
phi.method |
the method for estimating |
verbose |
the amount of feedback requested:
|
add0 |
if |
For each value in p.vec
,
the function computes an estimate of phi
and then computes the value of the log-likelihood for these parameters.
The plot of the log-likelihood against p.vec
allows the maximum likelihood value of p
to be found.
Once the value of p
is found,
the distribution within the class of Tweedie distribution is identified.
The main purpose of the function is to estimate the value
of the Tweedie index parameter, p,
which is produced by the output list as p.max
.
Optionally (if do.plot=TRUE
),
a plot is produced that shows the profile log-likelihood
computed at each value in p.vec
(smoothed if do.smooth=TRUE
).
This function can be temperamental
(for theoretical reasons involved in numerically computing the density),
and this plot shows the values of p requested on the
horizontal axis (using rug
);
there may be fewer points on the plot,
since the likelihood some values of p requested
may have returned NaN
, Inf
or NA
.
A list containing the components:
y
and x
(such that plot(x,y)
(partially)
recreates the profile likelihood plot);
ht
(the height of the nominal confidence interval);
L
(the estimate of the (log-) likelihood at each given value of p
);
p
(the p
-values used);
phi
(the computed values of phi
at the values in p
);
p.max
(the estimate of the mle of p
);
L.max
(the estimate of the (log-) likelihood at p.max
);
phi.max
(the estimate of phi
at p.max
);
ci
(the lower and upper limits of the confidence interval for p
);
method
(the method used for estimation: series
, inversion
,
interpolation
or saddlepoint
);
phi.method
(the method used for estimation of phi
:
saddlepoint
or phi
).
If glm.fit
is TRUE
,
the list also contains a component glm.obj
,
a glm
object for the fitted Tweedie generalized linear model.
The estimates of p
and phi
are printed.
The result is printed invisibly.
If the response variable has any exact zeros,
the values in p.vec
must all be between one and two.
The function is sometimes unstable and may fail.
It may also be very slow.
One solution is to change the method.
The default is method="inversion"
(the default);
then try method="series"
,
method="interpolation"
and
method="saddlepoint"
in that order.
Note that
method="saddlepoint"
is an approximate method only.
Also make sure the values in p.vec
are suitable for the data
(see above paragraph).
It is recommended that for the first use with a data set,
use p.vec
with only a small number of values
and set
do.smooth=FALSE
,
do.ci=FALSE
.
If this is successful,
a larger vector p.vec
and smoothing can be used.
Peter Dunn (pdunn2@usc.edu.au)
Dunn, P. K. and Smyth, G. K. (2008). Evaluation of Tweedie exponential dispersion model densities by Fourier inversion. Statistics and Computing, 18, 73–86. doi: 10.1007/s11222-007-9039-6
Dunn, Peter K and Smyth, Gordon K (2005). Series evaluation of Tweedie exponential dispersion model densities Statistics and Computing, 15(4). 267–280. doi: 10.1007/s11222-005-4070-y
Dunn, Peter K and Smyth, Gordon K (2001). Tweedie family densities: methods of evaluation. Proceedings of the 16th International Workshop on Statistical Modelling, Odense, Denmark, 2–6 July
Jorgensen, B. (1987). Exponential dispersion models. Journal of the Royal Statistical Society, B, 49, 127–162.
Jorgensen, B. (1997). Theory of Dispersion Models. Chapman and Hall, London.
Nelder, J. A. and Pregibon, D. (1987). An extended quasi-likelihood function. Biometrika 74(2), 221–232. doi: 10.1093/biomet/74.2.221
Tweedie, M. C. K. (1984). An index which distinguishes between some important exponential families. Statistics: Applications and New Directions. Proceedings of the Indian Statistical Institute Golden Jubilee International Conference (Eds. J. K. Ghosh and J. Roy), pp. 579-604. Calcutta: Indian Statistical Institute.
library(statmod) # Needed to use tweedie.profile # Generate some fictitious data test.data <- rgamma(n = 200, scale = 1, shape = 1) # The gamma is a Tweedie distribution with power = 2; # let's see if p = 2 is suggested by tweedie.profile: ## Not run: out <- tweedie.profile( test.data ~ 1, p.vec = seq(1.5, 2.5, by = 0.2) ) out$p.max out$ci ## End(Not run)
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.