Analyzing multinomial data
These functions facilitate the conversion and analysis of multinomial data as as series of nested binomial data.
The main interface is the multi
“family”, to be used in the family
argument of the fitting functions.
Fits using it call binomialize
, which can be called directly to check how the data are converted to nested binomial data, and to use these data directly.
The fitted.HLfitlist
method of the fitted
generic function returns a matrix of fitted multinomial probabilities.
The logLik.HLfitlist
method of the logLik
generic function returns a log-likelihood for the joint fits.
multi(binResponse=c("npos","nneg"),binfamily=binomial(),input="types",...) binomialize(data,responses,sortedTypes=NULL,binResponse=c("npos","nneg"), depth=Inf,input="types") ## S3 method for class 'HLfitlist' fitted(object, version=2L, ...) ## S3 method for class 'HLfitlist' logLik(object,which,...)
data |
The data frame to be analyzed. |
object |
A list of binomial fits returned by a multinomial analysis |
responses |
column names of the data, such that |
sortedTypes |
Names of multinomial types, i.e. levels of the multinomial response factors. Their order determines which types are taken first to define the nested binomial samples. By default, the most common types are considered first. |
binResponse |
The names to be given to the number of “success” and “failures” in the binomial response. |
depth |
The maximum number of nested binomial responses to be generated from the multinomial data. |
binfamily |
The family applied to each binomial response. |
input |
If |
which |
Which element of the |
version |
Integer, for |
... |
Other arguments passed from or to other functions. |
A multinomial response, say counts 17, 13, 25, 8, 3, 1 for types type1
to type6
, can be represented as a series of nested binomials
e.g. type1
against others (17 vs 50) then among these 50 others, type2
versus others (13 vs 37), etc.
The binomialize
function generates such a representation. By default the representation considers types in decreasing order of the number of positives, i.e. first type3
against others (25 vs 42), then type1
against others within these 42, etc. It stops if it has reached depth
nested binomial responses. This can be modified by the sortedTypes
argument, e.g. sortedTypes=c("type6","type4","type2")
.
binomialize
returns a list of data frames which can be directly provided as a data
argument for the fitting functions, with binomial response.
Alternatively, one can provide the multinomial response data frame, which will be internally converted to nested binomial data if the family
argument is a call to multinomial
(see Examples).
For mixed models, the multinomial data can be fitted to a model with the same correlation parameters, and either the same or different variances of random effects, for all binomial responses. Which analysis is performed depends on whether the variances are fitted by “outer optimization” or by HLfit
's “inner iterative” algorithm, as controlled by the init
or init.corrHLfit
arguments (see Examples). These initial values therefore affect the definition of the model being fitted.
corrHLfit
will fit different variances by default. Adding an init.corrHLfit
will force estimation of a single variance across models. fitme
's default optimization strategy is more complex, and has changed and still change over versions. This creates a back-compatibility issue where the model to be fitted may change over versions of spaMM. To avoid that, it is strongly advised to use an explicit initial value when fitting a multi
model by fitme
.
binomialize
returns a list of data frames appropriate for analysis as binomial response. Each data frame contains the original one plus
two columns named according to binResponse
.
The main fitting functions, when called on a model with family=multi(.)
, return an object of class HLfitlist
, which is a list with attributes. The list elements are fits of the nested binomial models (objects of class HLfit
). The attributes provide additional information about the overall multinomial model, such as global log-likelihood values and other information properly extracted by the how()
function.
multi
is a function that returns a list, but users may never need to manipulate this output.
fitted.HLfitlist
returns a matrix. The current default version=2L
provides meaningful fitted values (predicted multinomial frequencies for each response type) even for data rows where the nested binomial fit for a type had no response information remaining. By contrast, the first version provided a matrix with 0
s for these row*fit combinations, except for the last column; in many cases this may be confusing.
## Adding colour to the famous 'iris' dataset: iriscol <- iris set.seed(123) # Simulate colours, then fit colour frequencies: iriscol$col <- sample(c("yellow", "purple", "blue"),replace = TRUE, size = nrow(iriscol), prob=c(0.5,0.3,0.2)) colfit <- fitme(cbind(npos,nneg) ~ 1+(1|Species), family=multi(responses="col"), data=iriscol, init=list(lambda=NA)) # note warning if no 'init'... head(fitted(colfit)) # To only generate the binomial datasets: binomialize(iriscol,responses="col") ## An example considering pseudo-data at one diploid locus for 50 individuals set.seed(123) genecopy1 <- sample(4,size=50,prob=c(1/2,1/4,1/8,1/8),replace=TRUE) genecopy2 <- sample(4,size=50,prob=c(1/2,1/4,1/8,1/8),replace=TRUE) alleles <- c("122","124","126","128") genotypes <- data.frame(type1=alleles[genecopy1],type2=alleles[genecopy2]) ## Columns "type1","type2" each contains an allele type => input is "types" (the default) datalist <- binomialize(genotypes,responses=c("type1","type2")) ## two equivalent fits: f1 <- HLfit(cbind(npos,nneg)~1,data=datalist, family=binomial()) f2 <- HLfit(cbind(npos,nneg)~1,data=genotypes, family=multi(responses=c("type1","type2"))) fitted(f2) if (spaMM.getOption("example_maxtime")>1.7) { ##### Control of lambda estimation over different binomial submodels genoInSpace <- data.frame(type1=alleles[genecopy1],type2=alleles[genecopy2], x=runif(50),y=runif(50)) method <- "PQL" # for faster exampple ## Fitting distinct variances for all binomial responses: multifit <- corrHLfit(cbind(npos,nneg)~1+Matern(1|x+y),data=genoInSpace, family=multi(responses=c("type1","type2")), ranFix=list(rho=1,nu=0.5), method=method) length(unique(unlist(lapply(multifit, get_ranPars, which="lambda")))) # 3 multifit <- fitme(cbind(npos,nneg)~1+Matern(1|x+y),data=genoInSpace, family=multi(responses=c("type1","type2")), init=list(lambda=NaN), # forcing 'inner' estimation for fitme fixed=list(rho=1,nu=0.5), method=method) length(unique(unlist(lapply(multifit, get_ranPars, which="lambda")))) # 3 ## Fitting the same variance for all binomial responses: multifit <- fitme(cbind(npos,nneg)~1+Matern(1|x+y),data=genoInSpace, family=multi(responses=c("type1","type2")), init=list(lambda=NA), # forcing 'outer' estimation for fitme fixed=list(rho=1,nu=0.5), method=method) length(unique(unlist(lapply(multifit, get_ranPars, which="lambda")))) # 1 multifit <- corrHLfit(cbind(npos,nneg)~1+Matern(1|x+y),data=genoInSpace, family=multi(responses=c("type1","type2")), init.corrHLfit=list(lambda=1), # forcing 'outer' estimation for corrHLfit ranFix=list(rho=1,nu=0.5), method=method) length(unique(unlist(lapply(multifit, get_ranPars, which="lambda")))) # 1 }
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.