Multidimensional discrete item response theory
mdirt
fits a variety of item response models with discrete latent variables.
These include, but are not limited to, latent class analysis, multidimensional latent
class models, multidimensional discrete latent class models, DINA/DINO models,
grade of measurement models, C-RUM, and so on. If response models are not defined explicitly
then customized models can defined using the createItem
function.
mdirt( data, model, customTheta = NULL, structure = NULL, item.Q = NULL, nruns = 1, method = "EM", covdata = NULL, formula = NULL, itemtype = "lca", optimizer = "nlminb", return_max = TRUE, group = NULL, GenRandomPars = FALSE, verbose = TRUE, pars = NULL, technical = list(), ... )
data |
a |
model |
number of mutually exclusive classes to fit, or alternatively a more specific
|
customTheta |
input passed to |
structure |
an R formula allowing the profile probability patterns (i.e., the structural component of
the model) to be fitted according to a log-linear model. When |
item.Q |
a list of item-level Q-matrices indicating how the respective categories should be
modeled by the underlying attributes. Each matrix must represent a K_i \times A matrix,
where K_i represents the number of categories for the ith item, and A is the number
of attributes included in the |
nruns |
a numeric value indicating how many times the model should be fit to the data
when using random starting values. If greater than 1, |
method |
estimation method. Can be 'EM' or 'BL' (see |
covdata |
a data.frame of data used for latent regression models |
formula |
an R formula (or list of formulas) indicating how the latent traits
can be regressed using external covariates in |
itemtype |
a vector indicating the itemtype associated with each item.
For discrete models this is limited to only 'lca' or items defined using a
|
optimizer |
optimizer used for the M-step, set to |
return_max |
logical; when |
group |
a factor variable indicating group membership used for multiple group analyses |
GenRandomPars |
logical; use random starting values |
verbose |
logical; turn on messages to the R console |
pars |
used for modifying starting values; see |
technical |
list of lower-level inputs. See |
... |
additional arguments to be passed to the estimation engine. See |
Posterior classification accuracy for each response pattern may be obtained
via the fscores
function. The summary()
function will display
the category probability values given the class membership, which can also
be displayed graphically with plot()
, while coef()
displays the raw coefficient values (and their standard errors, if estimated). Finally,
anova()
is used to compare nested models, while
M2
and itemfit
may be used for model fitting purposes.
The latent class IRT model with two latent classes has the form
P(x = k|θ_1, θ_2, a1, a2) = \frac{exp(a1 θ_1 + a2 θ_2)}{ ∑_j^K exp(a1 θ_1 + a2 θ_2)}
where the θ values generally take on discrete points (such as 0 or 1). For proper identification, the first category slope parameters (a1 and a2) are never freely estimated. Alternatively, supplying a different grid of θ values will allow the estimation of similar models (multidimensional discrete models, grade of membership, etc.). See the examples below.
When the item.Q
for is utilized, the above equation can be understood as
P(x = k|θ_1, θ_2, a1, a2) = \frac{exp(a1 θ_1 Q_{j1} + a2 θ_2 Q_{j2})}{ ∑_j^K exp(a1 θ_1 Q_{j1} + a2 θ_2 Q_{j2})}
where by construction Q
is a K_i \times A matrix indicating whether the category should
be modeled according to the latent class structure. For the standard latent class model, the Q-matrix
has as many rows as categories, as many columns as the number of classes/attributes modeled,
and consist of 0's in the first row and 1's elsewhere. This of course can be over-written by passing
an alternative item.Q
definition for each respective item.
Phil Chalmers rphilip.chalmers@gmail.com
Chalmers, R., P. (2012). mirt: A Multidimensional Item Response Theory Package for the R Environment. Journal of Statistical Software, 48(6), 1-29.
Proctor, C. H. (1970). A probabilistic formulation and statistical analysis for Guttman scaling. Psychometrika, 35, 73-78. doi: 10.18637/jss.v048.i06
#LSAT6 dataset dat <- expand.table(LSAT6) # fit with 2-3 latent classes (mod2 <- mdirt(dat, 2)) ## Not run: (mod3 <- mdirt(dat, 3)) summary(mod2) residuals(mod2) residuals(mod2, type = 'exp') anova(mod2, mod3) M2(mod2) itemfit(mod2) # generate classification plots plot(mod2) plot(mod2, facet_items = FALSE) plot(mod2, profile = TRUE) # available for polytomous data mod <- mdirt(Science, 2) summary(mod) plot(mod) plot(mod, profile=TRUE) # classification based on response patterns fscores(mod2, full.scores = FALSE) # classify individuals either with the largest posterior probability..... fs <- fscores(mod2) head(fs) classes <- 1:2 class_max <- classes[apply(apply(fs, 1, max) == fs, 1, which)] table(class_max) # ... or by probability sampling (i.e., plausible value draws) class_prob <- apply(fs, 1, function(x) sample(1:2, 1, prob=x)) table(class_prob) # plausible value imputations for stochastic classification in both classes pvs <- fscores(mod2, plausible.draws=10) tabs <- lapply(pvs, function(x) apply(x, 2, table)) tabs[[1]] # fit with random starting points (run in parallel to save time) mirtCluster() mod <- mdirt(dat, 2, nruns=10) #-------------------------- # Grade of measurement model # define a custom Theta grid for including a 'fuzzy' class membership (Theta <- matrix(c(1, 0, .5, .5, 0, 1), nrow=3 , ncol=2, byrow=TRUE)) (mod_gom <- mdirt(dat, 2, customTheta = Theta)) summary(mod_gom) #----------------- # Multidimensional discrete latent class model dat <- key2binary(SAT12, key = c(1,4,5,2,3,1,2,1,3,1,2,4,2,1,5,3,4,4,1,4,3,3,4,1,3,5,1,3,1,5,4,5)) # define Theta grid for three latent classes (Theta <- thetaComb(0:1, 3)) (mod_discrete <- mdirt(dat, 3, customTheta = Theta)) summary(mod_discrete) # Located latent class model model <- mirt.model('C1 = 1-32 C2 = 1-32 C3 = 1-32 CONSTRAIN = (1-32, a1), (1-32, a2), (1-32, a3)') (mod_located <- mdirt(dat, model, customTheta = diag(3))) summary(mod_located) #----------------- ### DINA model example # generate some suitable data for a two dimensional DINA application # (first columns are intercepts) set.seed(1) Theta <- expand.table(matrix(c(1,0,0,0, 1,1,0,0, 1,0,1,0, 1,1,1,1), 4, 4, byrow=TRUE), freq = c(200,200,100,500)) a <- matrix(c(rnorm(15, -1.5, .5), rlnorm(5, .2, .3), numeric(15), rlnorm(5, .2, .3), numeric(15), rlnorm(5, .2, .3)), 15, 4) guess <- plogis(a[11:15,1]) # population guess slip <- 1 - plogis(rowSums(a[11:15,])) # population slip dat <- simdata(a, Theta=Theta, itemtype = 'lca') # first column is the intercept, 2nd and 3rd are attributes theta <- cbind(1, thetaComb(0:1, 2)) theta <- cbind(theta, theta[,2] * theta[,3]) #DINA interaction of main attributes model <- mirt.model('Intercept = 1-15 A1 = 1-5 A2 = 6-10 A1A2 = 11-15') # last 5 items are DINA (first 10 are unidimensional C-RUMs) DINA <- mdirt(dat, model, customTheta = theta) coef(DINA, simplify=TRUE) summary(DINA) M2(DINA) # fits well (as it should) cfs <- coef(DINA, simplify=TRUE)$items[11:15,] cbind(guess, estguess = plogis(cfs[,1])) cbind(slip, estslip = 1 - plogis(rowSums(cfs))) ### DINO model example theta <- cbind(1, thetaComb(0:1, 2)) # define theta matrix with negative interaction term (theta <- cbind(theta, -theta[,2] * theta[,3])) model <- mirt.model('Intercept = 1-15 A1 = 1-5, 11-15 A2 = 6-15 Yoshi = 11-15 CONSTRAIN = (11,a2,a3,a4), (12,a2,a3,a4), (13,a2,a3,a4), (14,a2,a3,a4), (15,a2,a3,a4)') # last five items are DINOs (first 10 are unidimensional C-RUMs) DINO <- mdirt(dat, model, customTheta = theta) coef(DINO, simplify=TRUE) summary(DINO) M2(DINO) #doesn't fit as well, because not the generating model ## C-RUM (analogous to MIRT model) theta <- cbind(1, thetaComb(0:1, 2)) model <- mirt.model('Intercept = 1-15 A1 = 1-5, 11-15 A2 = 6-15') CRUM <- mdirt(dat, model, customTheta = theta) coef(CRUM, simplify=TRUE) summary(CRUM) # good fit, but over-saturated (main effects for items 11-15 can be set to 0) M2(CRUM) #------------------ #multidimensional latent class model dat <- key2binary(SAT12, key = c(1,4,5,2,3,1,2,1,3,1,2,4,2,1,5,3,4,4,1,4,3,3,4,1,3,5,1,3,1,5,4,5)) # 5 latent classes within 2 different sets of items model <- mirt.model('C1 = 1-16 C2 = 1-16 C3 = 1-16 C4 = 1-16 C5 = 1-16 C6 = 17-32 C7 = 17-32 C8 = 17-32 C9 = 17-32 C10 = 17-32 CONSTRAIN = (1-16, a1), (1-16, a2), (1-16, a3), (1-16, a4), (1-16, a5), (17-32, a6), (17-32, a7), (17-32, a8), (17-32, a9), (17-32, a10)') theta <- diag(10) # defined explicitly. Otherwise, this profile is assumed mod <- mdirt(dat, model, customTheta = theta) coef(mod, simplify=TRUE) summary(mod) #------------------ # multiple group with constrained group probabilities dat <- key2binary(SAT12, key = c(1,4,5,2,3,1,2,1,3,1,2,4,2,1,5,3,4,4,1,4,3,3,4,1,3,5,1,3,1,5,4,5)) group <- rep(c('G1', 'G2'), each = nrow(SAT12)/2) Theta <- diag(2) # the latent class parameters are technically located in the (nitems + 1) location model <- mirt.model('A1 = 1-32 A2 = 1-32 CONSTRAINB = (33, c1)') mod <- mdirt(dat, model, group = group, customTheta = Theta) coef(mod, simplify=TRUE) summary(mod) #------------------ # Probabilistic Guttman Model (Proctor, 1970) # example analysis can also be found in the sirt package (see ?prob.guttman) data(data.read, package = 'sirt') head(data.read) Theta <- matrix(c(1,0,0,0, 1,1,0,0, 1,1,1,0, 1,1,1,1), 4, byrow=TRUE) model <- mirt.model("INTERCEPT = 1-12 C1 = 1,7,9,11 C2 = 2,5,8,10,12 C3 = 3,4,6") mod <- mdirt(data.read, model, customTheta=Theta) summary(mod) M2(mod) itemfit(mod) ## End(Not run)
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.