Smooth Parameter Estimation and Bootstrapping of Generalized Pareto Distributions with Penalized Maximum Likelihood Estimation
gamGPDfit()
fits the parameters of a generalized Pareto
distribution (GPD) depending on covariates in a non- or semiparametric
way.
gamGPDboot()
fits and bootstraps the parameters of a GPD
distribution depending on covariates in a non- or semiparametric
way. Applies the post-blackend bootstrap of Chavez-Demoulin and
Davison (2005).
gamGPDfit(x, threshold, nextremes = NULL, datvar, xiFrhs, nuFrhs, init = fit.GPD(x[,datvar], threshold = threshold, type = "pwm", verbose = FALSE)$par.ests, niter = 32, include.updates = FALSE, eps.xi = 1e-05, eps.nu = 1e-05, progress = TRUE, adjust = TRUE, verbose = FALSE, ...) gamGPDboot(x, B, threshold, nextremes = NULL, datvar, xiFrhs, nuFrhs, init = fit.GPD(x[,datvar], threshold = threshold, type = "pwm", verbose = FALSE)$par.ests, niter = 32, include.updates = FALSE, eps.xi = 1e-5, eps.nu = 1e-5, boot.progress = TRUE, progress = FALSE, adjust = TRUE, verbose = FALSE, debug = FALSE, ...)
x |
data.frame containing the losses (in some component; can be
specified with the argument |
B |
number of bootstrap replications. |
threshold |
threshold of the peaks-over-threshold (POT) method. |
nextremes |
number of excesses. This can be used to determine |
datvar |
name of the data column in |
xiFrhs |
right-hand side of the formula for xi in
the |
nuFrhs |
right-hand side of the formula for nu in
the |
init |
bivariate vector containing initial values for (xi, beta). |
niter |
maximal number of iterations in the backfitting algorithm. |
include.updates |
|
eps.xi |
epsilon for stop criterion for xi. |
eps.nu |
epsilon for stop criterion for nu. |
boot.progress |
|
progress |
|
adjust |
|
verbose |
|
debug |
|
... |
additional arguments passed to |
gamGPDfit()
fits the parameters xi and
beta of the generalized Pareto distribution
GPD(xi,beta) depending on covariates in
a non- or semiparametric way. The distribution function is given by
G[xi,beta](x)=1-(1+xi x/beta)^(-1/xi), x>=0,
for xi>0 (which is what we assume) and
beta>0. Note that β is also denoted by
σ in this package. Estimation of xi
and beta by gamGPDfit()
is done via penalized maximum
likelihood estimation, where the estimators are computed with a
backfitting algorithm. In order to guarantee convergence of this
algorithm, a reparameterization of beta in terms of the parameter
nu is done via
beta=exp(nu)/(1+xi).
The parameters xi and nu (and thus beta) are allowed to depend on covariates (including time) in a non- or semiparametric way, for example:
xi=xi(x,t)=x^Talpha[xi]+h[xi](t),
nu=nu(x,t)=x^Talpha[nu]+h[nu](t),
where x denotes the vector of covariates, alpha[xi], alpha[nu] are parameter vectors and h[xi], h[nu] are regression splines. For more details, see the references and the source code.
gamGPDboot()
first fits the GPD parameters via
gamGPDfit()
. It then conducts the post-blackend bootstrap of
Chavez-Demoulin and Davison (2005). To this end, it computes the
residuals, resamples them (B
times), reconstructs the
corresponding excesses, and refits the GPD parameters via
gamGPDfit()
again.
Note that if gam()
fails in gamGPDfit()
or the
fitting or one of the bootstrap replications in gamGPDboot()
,
then the output object contains (an) empty (sub)list(s). These
failures typically happen for too small sample sizes.
gamGPDfit()
returns either an empty list (list()
; in
case at least one of the two gam()
calls in the internal
function gamGPDfitUp()
fails) or a list with the components
xi
:estimated parameters xi;
beta
:estimated parameters beta;
nu
:estimated parameters nu;
se.xi
:standard error for xi ((possibly adjusted) second-order derivative of the reparameterized log-likelihood with respect to xi) multiplied by -1;
se.nu
:standard error for nu ((possibly adjusted) second-order derivative of the reparameterized log-likelihood with respect to nu) multiplied by -1;
xi.covar
:(unique) covariates for xi;
nu.covar
:(unique) covariates for nu;
covar
:available covariate combinations used for fitting beta (xi, nu);
y
:vector of excesses (exceedances minus threshold);
res
:residuals;
MRD
:mean relative distances between for all iterations, calculated between old parameters (xi, nu) (from the last iteration) and new parameters (currently estimated ones);
logL
:log-likelihood at the estimated parameters;
xiObj
:R object of type gamObject
for estimated
xi (returned by mgcv::gam()
);
nuObj
:R object of type gamObject
for estimated
nu (returned by mgcv::gam()
);
xiUpdates
:if include.updates
is
TRUE
, updates for xi for each
iteration. This is a list of R objects of type gamObject
which contains xiObj
as last element;
nuUpdates
:if include.updates
is
TRUE
, updates for nu for each
iteration. This is a list of R objects of type gamObject
which contains nuObj
as last element;
gamGPDboot()
returns a list of length B+1
where
the first component contains the results of
the initial fit via gamGPDfit()
and the other B
components contain the results for each replication of the
post-blackend bootstrap. Components for which gam()
fails (e.g., due to too few data) are given as empty lists (list()
).
Marius Hofert, Valerie Chavez-Demoulin.
Chavez-Demoulin, V., and Davison, A. C. (2005). Generalized additive models for sample extremes. Applied Statistics 54(1), 207–222.
Chavez-Demoulin, V., Embrechts, P., and Hofert, M. (2014). An extreme value approach for modeling Operational Risk losses depending on covariates. Journal of Risk and Insurance; accepted.
library(QRM) ## generate an example data set years <- 2003:2012 # years nyears <- length(years) n <- 250 # sample size for each (different) xi u <- 200 # threshold rGPD <- function(n, xi, beta) ((1-runif(n))^(-xi)-1)*beta/xi # sampling GPD set.seed(17) # setting seed xi.true.A <- seq(0.4, 0.8, length=nyears) # true xi for group "A" ## generate losses for group "A" lossA <- unlist(lapply(1:nyears, function(y) u + rGPD(n, xi=xi.true.A[y], beta=1))) xi.true.B <- xi.true.A^2 # true xi for group "B" ## generate losses for group "B" lossB <- unlist(lapply(1:nyears, function(y) u + rGPD(n, xi=xi.true.B[y], beta=1))) ## build data frame time <- rep(rep(years, each=n), 2) # "2" stands for the two groups covar <- rep(c("A","B"), each=n*nyears) value <- c(lossA, lossB) x <- data.frame(covar=covar, time=time, value=value) ## fit eps <- 1e-3 # to decrease the run time for this example require(mgcv) # due to s() fit <- gamGPDfit(x, threshold=u, datvar="value", xiFrhs=~covar+s(time)-1, nuFrhs=~covar+s(time)-1, eps.xi=eps, eps.nu=eps) ## note: choosing s(..., bs="cr") will fit cubic splines ## grab the fitted values per group and year xi.fit <- fitted(fit$xiObj) xi.fit. <- xi.fit[1+(0:(2*nyears-1))*n] # pick fit for each group and year xi.fit.A <- xi.fit.[1:nyears] # fit for "A" and each year xi.fit.B <- xi.fit.[(nyears+1):(2*nyears)] # fit for "B" and each year ## plot the fitted values of xi and the true ones we simulated from par(mfrow=c(1,2)) plot(years, xi.true.A, type="l", ylim=range(xi.true.A, xi.fit.A), main="Group A", xlab="Year", ylab=expression(xi)) points(years, xi.fit.A, type="l", col="red") legend("topleft", inset=0.04, lty=1, col=c("black", "red"), legend=c("true", "fitted"), bty="n") plot(years, xi.true.B, type="l", ylim=range(xi.true.B, xi.fit.B), main="Group B", xlab="Year", ylab=expression(xi)) points(years, xi.fit.B, type="l", col="blue") legend("topleft", inset=0.04, lty=1, col=c("black", "blue"), legend=c("true", "fitted"), bty="n")
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.