Fast estimation of multinomial logit models
Time and memory efficient estimation of multinomial logit models using maximum likelihood method. Targeted at large scale multiclass classification problems in econometrics and machine learning. Numerical optimization is performed by the Newton-Raphson method using an optimized, parallel C++ library to achieve fast computation of Hessian matrices. The user interface closely related to the CRAN package mlogit.
mnlogit(formula, data, choiceVar=NULL, maxiter = 50, ftol = 1e-6, gtol = 1e-6, weights = NULL, ncores = 1, na.rm = TRUE, print.level=0, linDepTol = 1e-6, start=NULL, alt.subset=NULL, ...) ## S3 method for class 'mnlogit' fitted(object, outcome=TRUE, ...) ## S3 method for class 'mnlogit' residuals(object, outcome=TRUE, ...) ## S3 method for class 'mnlogit' df.residual(object, ...) ## S3 method for class 'mnlogit' terms(x, ...) ## S3 method for class 'mnlogit' update(object, new, ...) ## S3 method for class 'mnlogit' print(x, digits = max(3, getOption("digits") - 2), width = getOption("width"), what = c("obj", "eststat", "modsize"), ...) ## S3 method for class 'mnlogit' vcov(object, ...) ## S3 method for class 'mnlogit' logLik(object, ...) ## S3 method for class 'mnlogit' summary(object, ...) ## S3 method for class 'mnlogit' print.summary(x, digits = max(3, getOption("digits") - 2), width = getOption("width"), ... ) ## S3 method for class 'mnlogit' index(object, ...) ## S3 method for class 'mnlogit' predict(object, newdata = NULL, probability = TRUE, returnData=FALSE, choiceVar=NULL, ...) ## S3 method for class 'mnlogit' coef(object, order=FALSE, as.list = FALSE, ...)
formula |
|
data, newdata |
A |
choiceVar |
A string naming the column in 'data' which has the list of choices. Note: This argument is not used if |
maxiter |
An integer indicating maximum number of Newton's iterations. If |
ftol |
A real number indicating tolerance on the difference of two subsequent loglikelihood values. |
gtol |
A real number indicating tolerance on norm of the gradient. |
weights |
Optional vector of (positive) frequency weights, one for each observation. |
ncores |
An integer indicating number of processors allowed for Hessian calculations. |
na.rm |
a logical variable which indicates whether rows of the data frame containing NAs will be removed. |
print.level |
An integer which controls the amount of information to be printed during execution. |
linDepTol |
Tolerance for detecting linear dependence between columns in input data. Dependent columns are removed from the estimation. |
start |
Named vector of coefficients to use as initial guess. Use naming convention as given by |
alt.subset |
Subset of alternatives to perform estimation on. |
... |
Currently unused. |
object, x |
An object of class |
outcome |
a boolean which indicates, for the |
new |
An |
digits |
Number of digits to print. |
width |
The width of printing. |
what |
Specifies what to print. Default option is 'obj' is the print function for mnlogit objects. Option 'eststat' prints etimation stats and option 'mdsize' prints model size information. |
probability |
If TRUE predict output the probability matrix, otherwise the chocice with the highest probability for each observation is returned. |
returnData |
If |
order |
If |
as.list |
Returns estimated model coefficients grouped by variable type. |
An object of class mnlogit
, with elements:
coefficients |
the named vector of coefficients. |
logLik |
the value of the log-likelihood function at exit. |
gradient |
the gradient of the log-likelihood function at exit. |
hessian |
the Hessian of the log-likelihood function at exit. |
est.stat |
Newton Raphson stats. |
fitted.values |
Estimated probabilities of the alternative selected in each observation. |
probabilities |
the probability matrix: |
residuals |
The residual. Has attribute |
df |
The number of estimated coefficients in the model. |
AIC |
The AIC value of the fitted model. |
choices |
The vector of alternatives's names. |
model.size |
Information about number of parameters in model. |
ordered.coeff |
Vector of coefficients ordered by variable name. |
model |
The |
freq |
The relative frequency of each choice in input data. |
formula |
The |
call |
The |
1. The data must be in the 'long' format. This means that for each observation there must be as many rows as there are alternatives (which should be grouped together).
2. The formula should be specified in the format: responseVar ~ choice specific variables with generic coefficients | individual specific variables | choice specific variables with choice specific coefficients. These are the 3 available variable types.
3. Any type of variables may be omitted. To omit use "1" as a placeholder.
4. An alternative specific intercept is included by default in the estimation. To omit it, use a '-1' or '0' anywhere in the formula.
Asad Hasan, Wang Zhiyu, Alireza S. Mahani
Asad Hasan, Zhiyu Wang, Alireza S. Mahani (2016).Fast Estimation of Multinomial Logit Models: R Package mnlogit. Journal of Statistical Software, 75(3), 1-24. doi:10.18637/jss.v075.i03
Croissant, Yves. Estimation of multinomial logit models in R: The mlogit Packages. https://cran.r-project.org/package=mlogit
Train, K. (2004). Discrete Choice Methods with Simulation, Cambridge University Press.
library(mnlogit) data(Fish, package = "mnlogit") fm <- formula(mode ~ price | income | catch) fit <- mnlogit(fm, Fish, ncores = 2) ## Not run: fit <- mnlogit(fm, Fish, choiceVar="alt", ncores = 2) # same effect as previous summary(fit) print(fit) predict(fit) print(fit, what = "eststat") print(fit, what = "modsize") # Formula examples (see also Note) fm <- formula(mode ~ 1 | income) # Only type-2 with intercept fm <- formula(mode ~ price - 1) # Only type-1, no intercept fm <- formula(mode ~ 1 | 1 | catch) # Only type-3, including intercept ## End(Not run)
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.