Count number of pairwise cases for a data set with missing (NA) data and impute values.
When doing cor(x, use= "pairwise"), it is nice to know the number of cases for each pairwise correlation. This is particularly useful when doing SAPA type analyses. More importantly, when there are some missing pairs, it is useful to supply imputed values so that further analyses may be done. This is useful if using the Massively Missing Completely at Random (MMCAR) designs used by the SAPA project. The specific pairs missing may be identified by pairwiseZero. Summaries of the counts are given by pairwiseDescribe.
pairwiseCount(x, y = NULL,diagonal=TRUE) pairwiseDescribe(x,y,diagonal=FALSE,...) pairwiseZero(x,y=NULL, min=0, short=TRUE) pairwiseImpute(keys,R,fix=FALSE) pairwiseReport(x,y=NULL,cut=0,diagonal=FALSE,...) pairwiseSample(x,y=NULL,diagonal=FALSE,size=100,...) pairwisePlot(x,y=NULL,upper=TRUE,diagonal=TRUE,labels=TRUE,show.legend=TRUE,n.legend=10, colors=FALSE,gr=NULL,minlength=6,xlas=1,ylas=2, main="Relative Frequencies",count=TRUE,...) count.pairwise(x, y = NULL,diagonal=TRUE) #deprecated
x |
An input matrix, typically a data matrix ready to be correlated. |
y |
An optional second input matrix |
diagonal |
if TRUE, then report the diagonal, else fill the diagonals with NA |
... |
Other parameters to pass to describe |
min |
Count the number of item pairs with <= min entries |
short |
Show the table of the item pairs that have entries <= min |
keys |
A keys.list specifying which items belong to which scale. |
R |
A correlation matrix to be described or imputed |
cut |
Report the item pairs and numbers with cell sizes less than cut |
fix |
If TRUE, then replace all NA correlations with the mean correlation for that within or between scale |
upper |
Should the upper off diagonal matrix be drawn, or left blank? |
labels |
if NULL, use column and row names, otherwise use labels |
show.legend |
A legend (key) to the colors is shown on the right hand side |
n.legend |
How many categories should be labelled in the legend? |
colors |
Defaults to FALSE and will use a grey scale. colors=TRUE use colors \ from the colorRampPalette from red through white to blue |
minlength |
If not NULL, then the maximum number of characters to use in row/column labels |
xlas |
Orientation of the x axis labels (1 = horizontal, 0, parallel to axis, 2 perpendicular to axis) |
ylas |
Orientation of the y axis labels (1 = horizontal, 0, parallel to axis, 2 perpendicular to axis) |
main |
A title. Defaults to "Relative Frequencies" |
gr |
A color gradient: e.g., gr <- colorRampPalette(c("#B52127", "white", "#2171B5")) will produce slightly more pleasing (to some) colors. See next to last example of |
count |
Should we count the number of pairwise observations using pairwiseCount, or just plot the counts for a matrix? |
size |
Sample size of the number of variables to sample in pairwiseSample |
When using Massively Missing Completely at Random (MMCAR) designs used by the SAPA project, it is important to count the number of pairwise observations (pairwiseCount
). If there are pairs with 1 or fewer observations, these will produce NA values for correlations making subsequent factor analyses fa
or reliability analsyes omega
or scoreOverlap
impossible.
In order to identify item pairs with counts less than a certain value pairwiseReport
reports the names of those pairs with fewer than 'cut' observations. By default, it just reports the number of offending items. With short=FALSE, the print will give the items with n.obs < cut. Even more detail is available in the returned objects.
The specific pairs that have values <= n min in any particular table of the paiwise counts may be given by pairwiseZero
.
To remedy the problem of missing correlations, we impute the missing correlations using pairwiseImpute
.
The technique takes advantage of the scale based structure of SAPA items. Items within a scale (e.g. Letter Number Series similar to the ability
items) are imputed to correlate with items from another scale (e.g., Matrix Reasoning) at the average of these two between scale inter-item mean correlations. The average correlations within and between scales are reported by pairwiseImpute
and if the fix paremeter is specified, the imputed correlation matrix is returned.
Alternative methods of imputing these correlations are not yet implemented.
The time to count cell size varies linearly by the number of subjects and of the number of items squared. This becomes prohibitive for larger (big n items) data sets. pairwiseSample
will take samples of size=size of these bigger data sets and then returns basic descriptive statistics of these counts, including mean, median, and the .05, .25, .5, .75 and .95 quantiles.
result |
= matrix of counts of pairwise observations (if pairwiseCount) |
av.r |
The average correlation value of the observed correlations within/between scales |
count |
The numer of observed correlations within/between each scale |
percent |
The percentage of complete data by scale |
imputed |
The original correlation matrix with NA values replaced by the mean correlation for items within/between the appropriate scale. |
Maintainer: William Revelle revelle@northwestern.edu
x <- matrix(rnorm(900),ncol=6) y <- matrix(rnorm(450),ncol=3) x[x < 0] <- NA y[y > 1] <- NA pairwiseCount(x) pairwiseCount(y) pairwiseCount(x,y) pairwiseCount(x,diagonal=FALSE) pairwiseDescribe(x,quant=c(.1,.25,.5,.75,.9)) #examine the structure of the ability data set keys <- list(ICAR16=colnames(psychTools::ability),reasoning = cs(reason.4,reason.16,reason.17,reason.19), letters=cs(letter.7, letter.33,letter.34,letter.58, letter.7), matrix=cs(matrix.45,matrix.46,matrix.47,matrix.55), rotate=cs(rotate.3,rotate.4,rotate.6,rotate.8)) pairwiseImpute(keys,psychTools::ability)
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.