General Diagnostic Model
This function estimates the general diagnostic model (von Davier, 2008; Xu & von Davier, 2008) which handles multidimensional item response models with ordered discrete or continuous latent variables for polytomous item responses.
gdm( data, theta.k, irtmodel="2PL", group=NULL, weights=rep(1, nrow(data)), Qmatrix=NULL, thetaDes=NULL, skillspace="loglinear", b.constraint=NULL, a.constraint=NULL, mean.constraint=NULL, Sigma.constraint=NULL, delta.designmatrix=NULL, standardized.latent=FALSE, centered.latent=FALSE, centerintercepts=FALSE, centerslopes=FALSE, maxiter=1000, conv=1e-5, globconv=1e-5, msteps=4, convM=.0005, decrease.increments=FALSE, use.freqpatt=FALSE, progress=TRUE, PEM=FALSE, PEM_itermax=maxiter, ...) ## S3 method for class 'gdm' summary(object, file=NULL, ...) ## S3 method for class 'gdm' print(x, ...) ## S3 method for class 'gdm' plot(x, perstype="EAP", group=1, barwidth=.1, histcol=1, cexcor=3, pchpers=16, cexpers=.7, ... )
data |
An N \times I matrix of polytomous item responses with categories k=0,1,...,K |
theta.k |
In the one-dimensional case it must be a vector.
For multidimensional models it has to be a list
of skill vectors if the theta grid differs between
dimensions. If not, a vector input can be supplied.
If an estimated skillspace ( |
irtmodel |
The default |
group |
An optional vector of group identifiers for
multiple group estimation.
For |
weights |
An optional vector of sample weights |
Qmatrix |
An optional array of dimension I \times D \times K which indicates pre-specified item loadings on dimensions. The default for category k is the score k, i.e. the scoring in the (generalized) partial credit model. |
thetaDes |
A design matrix for specifying nonlinear item response functions (see Example 1, Models 4 and 5) |
skillspace |
The parametric assumption of the skillspace.
If |
b.constraint |
In this optional matrix with C_b rows and three columns, C_b item intercepts b_{ik} can be fixed. 1st column: item index, 2nd column: category index, 3rd column: fixed item thresholds |
a.constraint |
In this optional matrix with C_a rows and four columns,
C_a item intercepts a_{idk} can be fixed.
1st column: item index, 2nd column: dimension index,
3rd column: category index, 4th column: fixed item slopes |
mean.constraint |
A C \times 3 matrix for
constraining C means in the
normal distribution assumption ( |
Sigma.constraint |
A C \times 4 matrix for
constraining C covariances in the
normal distribution assumption ( |
delta.designmatrix |
The design matrix of δ parameters for the reduced skillspace estimation (see Xu & von Davier, 2008) |
standardized.latent |
A logical indicating whether in a uni- or multidimensional
model all latent variables of the first group should be normally distributed
and standardized. The default is |
centered.latent |
A logical indicating whether in a uni- or multidimensional
model all latent variables of the first group should be normally
distributed and do have zero means? The default is |
centerintercepts |
A logical indicating whether intercepts should be centered to have a mean of 0 for all dimensions. This argument does not (yet) work properly for varying numbers of item categories. |
centerslopes |
A logical indicating whether item slopes should be centered to have
a mean of 1 for all dimensions. This argument only works for
|
maxiter |
Maximum number of iterations |
conv |
Convergence criterion for item parameters and distribution parameters |
globconv |
Global deviance convergence criterion |
msteps |
Maximum number of M steps in estimating b and a item parameters. The default is to use 4 M steps. |
convM |
Convergence criterion in M step |
decrease.increments |
Should in the M step the increments
of a and b parameters decrease during iterations?
The default is |
use.freqpatt |
A logical indicating whether frequencies of unique item response patterns
should be used. In case of large data set |
progress |
An optional logical indicating whether the function should print the progress of iteration in the estimation process. |
PEM |
Logical indicating whether the P-EM acceleration should be applied (Berlinet & Roland, 2012). |
PEM_itermax |
Number of iterations in which the P-EM method should be applied. |
object |
A required object of class |
file |
Optional file name for a file in which |
x |
A required object of class |
perstype |
Person parameter estimate type. Can be either
|
barwidth |
Bar width in |
histcol |
Color of histogram bars in |
cexcor |
Font size for print of correlation in |
pchpers |
Point type for scatter plot of person
parameters in |
cexpers |
Point size for scatter plot of person
parameters in |
... |
Optional parameters to be passed to or from other methods will be ignored. |
Case irtmodel="1PL"
:
Equal item slopes of 1 are assumed in this model. Therefore,
it corresponds to a generalized multidimensional Rasch model.
logit P( X_{nj}=k | θ_n )=b_{j0} + ∑_d q_{jdk} θ_{nd}
The Q-matrix entries q_{jdk} are pre-specified by the user.
Case irtmodel="2PL"
:
For each item and each dimension, different item slopes a_{jd}
are estimated:
logit P( X_{nj}=k | θ_n )=b_{j0} + ∑_d a_{jd} q_{jdk} θ_{nd}
Case irtmodel="2PLcat"
:
For each item, each dimension and each category,
different item slopes a_{jdk}
are estimated:
logit P( X_{nj}=k | θ_n )=b_{j0} + ∑_d a_{jdk} q_{jdk} θ_{nd}
Note that this model can be generalized to include terms of any transformation t_h of the θ_n vector (e.g. quadratic terms, step functions or interaction) such that the model can be formulated as
logit P( X_{nj}=k | θ_n )=b_{j0} + ∑_h a_{jhk} q_{jhk} t_h( θ_{n} )
In general, the number of functions t_1, ..., t_H will be larger than the θ dimension of D.
The estimation follows an EM algorithm as described in von Davier and Yamamoto (2004) and von Davier (2008).
In case of skillspace="est"
, the \bold{θ} vectors
(the grid of the theta distribution) are estimated (Bartolucci, 2007;
Bacci, Bartolucci & Gnaldi, 2012). This model is called a multidimensional
latent class item response model.
An object of class gdm
. The list contains the
following entries:
item |
Data frame with item parameters |
person |
Data frame with person parameters:
|
EAP.rel |
Reliability of the EAP |
deviance |
Deviance |
ic |
Information criteria, number of estimated parameters |
b |
Item intercepts b_{jk} |
se.b |
Standard error of item intercepts b_{jk} |
a |
Item slopes a_{jd} resp. a_{jdk} |
se.a |
Standard error of item slopes a_{jd} resp. a_{jdk} |
itemfit.rmsea |
The RMSEA item fit index (see |
mean.rmsea |
Mean of RMSEA item fit indexes. |
Qmatrix |
Used Q-matrix |
pi.k |
Trait distribution |
mean.trait |
Means of trait distribution |
sd.trait |
Standard deviations of trait distribution |
skewness.trait |
Skewnesses of trait distribution |
correlation.trait |
List of correlation matrices of trait distribution corresponding to each group |
pjk |
Item response probabilities evaluated at grid |
n.ik |
An array of expected counts n_{cikg} of ability class c at item i at category k in group g |
G |
Number of groups |
D |
Number of dimension of \bold{θ} |
I |
Number of items |
N |
Number of persons |
delta |
Parameter estimates for skillspace representation |
covdelta |
Covariance matrix of parameter estimates for skillspace representation |
data |
Original data frame |
group.stat |
Group statistics (sample sizes, group labels) |
p.xi.aj |
Individual likelihood |
posterior |
Individual posterior distribution |
skill.levels |
Number of skill levels per dimension |
K.item |
Maximal category per item |
theta.k |
Used theta design or estimated theta trait distribution
in case of |
thetaDes |
Used theta design for item responses |
se.theta.k |
Estimated standard errors of |
time |
Info about computation time |
skillspace |
Used skillspace parametrization |
iter |
Number of iterations |
converged |
Logical indicating whether convergence was achieved. |
object |
Object of class |
x |
Object of class |
perstype |
Person paramter estimate type. Can be either
|
group |
Group which should be used for |
barwidth |
Bar width in |
histcol |
Color of histogram bars in |
cexcor |
Font size for print of correlation in |
pchpers |
Point type for scatter plot of person
parameters in |
cexpers |
Point size for scatter plot of person
parameters in |
... |
Optional parameters to be passed to or from other methods will be ignored. |
Bacci, S., Bartolucci, F., & Gnaldi, M. (2012). A class of multidimensional latent class IRT models for ordinal polytomous item responses. arXiv preprint, arXiv:1201.4667.
Bartolucci, F. (2007). A class of multidimensional IRT models for testing unidimensionality and clustering items. Psychometrika, 72, 141-157.
Berlinet, A. F., & Roland, C. (2012). Acceleration of the EM algorithm: P-EM versus epsilon algorithm. Computational Statistics & Data Analysis, 56(12), 4122-4137.
von Davier, M. (2008). A general diagnostic model applied to language testing data. British Journal of Mathematical and Statistical Psychology, 61, 287-307.
von Davier, M., & Yamamoto, K. (2004). Partially observed mixtures of IRT models: An extension of the generalized partial-credit model. Applied Psychological Measurement, 28, 389-406.
Xu, X., & von Davier, M. (2008). Fitting the structured general diagnostic model to NAEP data. ETS Research Report ETS RR-08-27. Princeton, ETS.
For assessment of model fit see modelfit.cor.din
and
anova.gdm
.
See itemfit.sx2
for item fit statistics.
For the estimation of the multidimensional
latent class item response model see the MultiLCIRT package
and sirt package (function sirt::rasch.mirtlc
).
############################################################################# # EXAMPLE 1: Fraction Dataset 1 # Unidimensional Models for dichotomous data ############################################################################# data(data.fraction1, package="CDM") dat <- data.fraction1$data theta.k <- seq( -6, 6, len=15 ) # discretized ability #*** # Model 1: Rasch model (normal distribution) mod1 <- CDM::gdm( dat, irtmodel="1PL", theta.k=theta.k, skillspace="normal", centered.latent=TRUE) summary(mod1) plot(mod1) #*** # Model 2: Rasch model (log-linear smoothing) # set the item difficulty of the 8th item to zero b.constraint <- matrix( c(8,1,0), 1, 3 ) mod2 <- CDM::gdm( dat, irtmodel="1PL", theta.k=theta.k, skillspace="loglinear", b.constraint=b.constraint ) summary(mod2) #*** # Model 3: 2PL model mod3 <- CDM::gdm( dat, irtmodel="2PL", theta.k=theta.k, skillspace="normal", standardized.latent=TRUE ) summary(mod3) ## Not run: #*** # Model 4: include quadratic term in item response function # using the argument decrease.increments=TRUE leads to a more # stable estimate thetaDes <- cbind( theta.k, theta.k^2 ) colnames(thetaDes) <- c( "F1", "F1q" ) mod4 <- CDM::gdm( dat, irtmodel="2PL", theta.k=theta.k, thetaDes=thetaDes, skillspace="normal", standardized.latent=TRUE, decrease.increments=TRUE) summary(mod4) #*** # Model 5: step function for ICC # two different probabilities theta < 0 and theta > 0 thetaDes <- matrix( 1*(theta.k>0), ncol=1 ) colnames(thetaDes) <- c( "Fgrm1" ) mod5 <- CDM::gdm( dat, irtmodel="2PL", theta.k=theta.k, thetaDes=thetaDes, skillspace="normal" ) summary(mod5) #*** # Model 6: DINA model with din function mod6 <- CDM::din( dat, q.matrix=matrix( 1, nrow=ncol(dat),ncol=1 ) ) summary(mod6) #*** # Model 7: Estimating a version of the DINA model with gdm theta.k <- c(-.5,.5) mod7 <- CDM::gdm( dat, irtmodel="2PL", theta.k=theta.k, skillspace="loglinear" ) summary(mod7) ############################################################################# # EXAMPLE 2: Cultural Activities - data.Students # Unidimensional Models for polytomous data ############################################################################# data(data.Students, package="CDM") dat <- data.Students dat <- dat[, grep( "act", colnames(dat) ) ] theta.k <- seq( -4, 4, len=11 ) # discretized ability #*** # Model 1: Partial Credit Model (PCM) mod1 <- CDM::gdm( dat, irtmodel="1PL", theta.k=theta.k, skillspace="normal", centered.latent=TRUE) summary(mod1) plot(mod1) #*** # Model 1b: PCM using frequency patterns mod1b <- CDM::gdm( dat, irtmodel="1PL", theta.k=theta.k, skillspace="normal", centered.latent=TRUE, use.freqpatt=TRUE) summary(mod1b) #*** # Model 2: PCM with two groups mod2 <- CDM::gdm( dat, irtmodel="1PL", theta.k=theta.k, group=CDM::data.Students$urban + 1, skillspace="normal", centered.latent=TRUE) summary(mod2) #*** # Model 3: PCM with loglinear smoothing b.constraint <- matrix( c(1,2,0), ncol=3 ) mod3 <- CDM::gdm( dat, irtmodel="1PL", theta.k=theta.k, skillspace="loglinear", b.constraint=b.constraint ) summary(mod3) #*** # Model 4: Model with pre-specified item weights in Q-matrix Qmatrix <- array( 1, dim=c(5,1,2) ) Qmatrix[,1,2] <- 2 # default is score 2 for category 2 # now change the scoring of category 2: Qmatrix[c(2,4),1,1] <- .74 Qmatrix[c(2,4),1,2] <- 2.3 # for items 2 and 4 the score for category 1 is .74 and for category 2 it is 2.3 mod4 <- CDM::gdm( dat, irtmodel="1PL", theta.k=theta.k, Qmatrix=Qmatrix, skillspace="normal", centered.latent=TRUE) summary(mod4) #*** # Model 5: Generalized partial credit model mod5 <- CDM::gdm( dat, irtmodel="2PL", theta.k=theta.k, skillspace="normal", standardized.latent=TRUE ) summary(mod5) #*** # Model 6: Item-category slope estimation mod6 <- CDM::gdm( dat, irtmodel="2PLcat", theta.k=theta.k, skillspace="normal", standardized.latent=TRUE, decrease.increments=TRUE) summary(mod6) #*** # Models 7: items with different number of categories dat0 <- dat dat0[ paste(dat0[,1])==2, 1 ] <- 1 # 1st item has only two categories dat0[ paste(dat0[,3])==2, 3 ] <- 1 # 3rd item has only two categories # Model 7a: PCM mod7a <- CDM::gdm( dat0, irtmodel="1PL", theta.k=theta.k, centered.latent=TRUE ) summary(mod7a) # Model 7b: Item category slopes mod7b <- CDM::gdm( dat0, irtmodel="2PLcat", theta.k=theta.k, standardized.latent=TRUE, decrease.increments=TRUE ) summary(mod7b) ############################################################################# # EXAMPLE 3: Fraction Dataset 2 # Multidimensional Models for dichotomous data ############################################################################# data(data.fraction2, package="CDM") dat <- data.fraction2$data Qmatrix <- data.fraction2$q.matrix3 #*** # Model 1: One-dimensional Rasch model theta.k <- seq( -4, 4, len=11 ) # discretized ability mod1 <- CDM::gdm( dat, irtmodel="1PL", theta.k=theta.k, centered.latent=TRUE) summary(mod1) plot(mod1) #*** # Model 2: One-dimensional 2PL model mod2 <- CDM::gdm( dat, irtmodel="2PL", theta.k=theta.k, standardized.latent=TRUE) summary(mod2) plot(mod2) #*** # Model 3: 3-dimensional Rasch Model (normal distribution) mod3 <- CDM::gdm( dat, irtmodel="1PL", theta.k=theta.k, Qmatrix=Qmatrix, centered.latent=TRUE, globconv=5*1E-3, conv=1E-4 ) summary(mod3) #*** # Model 4: 3-dimensional Rasch model (loglinear smoothing) # set some item parameters of items 4,1 and 2 to zero b.constraint <- cbind( c(4,1,2), 1, 0 ) mod4 <- CDM::gdm( dat, irtmodel="1PL", theta.k=theta.k, Qmatrix=Qmatrix, b.constraint=b.constraint, skillspace="loglinear" ) summary(mod4) #*** # Model 5: define a different theta grid for each dimension theta.k <- list( "Dim1"=seq( -5, 5, len=11 ), "Dim2"=seq(-5,5,len=8), "Dim3"=seq( -3,3,len=6) ) mod5 <- CDM::gdm( dat, irtmodel="1PL", theta.k=theta.k, Qmatrix=Qmatrix, b.constraint=b.constraint, skillspace="loglinear") summary(mod5) #*** # Model 6: multdimensional 2PL model (normal distribution) theta.k <- seq( -5, 5, len=13 ) a.constraint <- cbind( c(8,1,3), 1:3, 1, 1 ) # fix some slopes to 1 mod6 <- CDM::gdm( dat, irtmodel="2PL", theta.k=theta.k, Qmatrix=Qmatrix, centered.latent=TRUE, a.constraint=a.constraint, decrease.increments=TRUE, skillspace="normal") summary(mod6) #*** # Model 7: multdimensional 2PL model (loglinear distribution) a.constraint <- cbind( c(8,1,3), 1:3, 1, 1 ) b.constraint <- cbind( c(8,1,3), 1, 0 ) mod7 <- CDM::gdm( dat, irtmodel="2PL", theta.k=theta.k, Qmatrix=Qmatrix, b.constraint=b.constraint, a.constraint=a.constraint, decrease.increments=FALSE, skillspace="loglinear") summary(mod7) ############################################################################# # EXAMPLE 4: Unidimensional latent class 1PL IRT model ############################################################################# # simulate data set.seed(754) I <- 20 # number of items N <- 2000 # number of persons theta <- c( -2, 0, 1, 2 ) theta <- rep( theta, c(N/4,N/4, 3*N/8, N/8) ) b <- seq(-2,2,len=I) library(sirt) # use function sim.raschtype from sirt package dat <- sirt::sim.raschtype( theta=theta, b=b ) theta.k <- seq(-1, 1, len=4) # initial vector of theta # estimate model mod1 <- CDM::gdm( dat, theta.k=theta.k, skillspace="est", irtmodel="1PL", centerintercepts=TRUE, maxiter=200) summary(mod1) ## Estimated Skill Distribution ## F1 pi.k ## 1 -1.988 0.24813 ## 2 -0.055 0.23313 ## 3 0.940 0.40059 ## 4 2.000 0.11816 ############################################################################# # EXAMPLE 5: Multidimensional latent class IRT model ############################################################################# # We simulate a two-dimensional IRT model in which theta vectors # are observed at a fixed discrete grid (see below). # simulate data set.seed(754) I <- 13 # number of items N <- 2400 # number of persons # simulate Dimension 1 at 4 discrete theta points theta <- c( -2, 0, 1, 2 ) theta <- rep( theta, c(N/4,N/4, 3*N/8, N/8) ) b <- seq(-2,2,len=I) library(sirt) # use simulation function from sirt package dat1 <- sirt::sim.raschtype( theta=theta, b=b ) # simulate Dimension 2 at 4 discrete theta points theta <- c( -3, 0, 1.5, 2 ) theta <- rep( theta, c(N/4,N/4, 3*N/8, N/8) ) dat2 <- sirt::sim.raschtype( theta=theta, b=b ) colnames(dat2) <- gsub( "I", "U", colnames(dat2)) dat <- cbind( dat1, dat2 ) # define Q-matrix Qmatrix <- matrix(0,2*I,2) Qmatrix[ cbind( 1:(2*I), rep(1:2, each=I) ) ] <- 1 theta.k <- seq(-1, 1, len=4) # initial matrix theta.k <- cbind( theta.k, theta.k ) colnames(theta.k) <- c("Dim1","Dim2") # estimate model mod2 <- CDM::gdm( dat, theta.k=theta.k, skillspace="est", irtmodel="1PL", Qmatrix=Qmatrix, centerintercepts=TRUE) summary(mod2) ## Estimated Skill Distribution ## theta.k.Dim1 theta.k.Dim2 pi.k ## 1 -2.022 -3.035 0.25010 ## 2 0.016 0.053 0.24794 ## 3 0.956 1.525 0.36401 ## 4 1.958 1.919 0.13795 ############################################################################# # EXAMPLE 6: Large-scale dataset data.mg ############################################################################# data(data.mg, package="CDM") dat <- data.mg[, paste0("I", 1:11 ) ] theta.k <- seq(-6,6,len=21) #*** # Model 1: Generalized partial credit model with multiple groups mod1 <- CDM::gdm( dat, irtmodel="2PL", theta.k=theta.k, group=CDM::data.mg$group, skillspace="normal", standardized.latent=TRUE) summary(mod1) ## End(Not run)
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.