Plot the distribution of predictions for each class
plotd(object, hist = FALSE, type = NULL, nresponse = NULL, dichot = FALSE, trace = FALSE, xlim = NULL, ylim = NULL, jitter = FALSE, main=NULL, xlab = "Predicted Value", ylab = if(hist) "Count" else "Density", lty = 1, col = c("gray70", 1, "lightblue", "brown", "pink", 2, 3, 4), fill = if(hist) col[1] else 0, breaks = "Sturges", labels = FALSE, kernel = "gaussian", adjust = 1, zero.line = FALSE, legend = TRUE, legend.names = NULL, legend.pos = NULL, cex.legend = .8, legend.bg = "white", legend.extra = FALSE, vline.col = 0, vline.thresh = .5, vline.lty = 1, vline.lwd = 1, err.thresh = vline.thresh, err.col = 0, err.border = 0, err.lwd = 1, xaxt = "s", yaxt = "s", xaxis.cex = 1, sd.thresh = 0.01, ...)
To start off, look at the arguments object
, hist
, type
.
For predict methods with multiple column responses, see the nresponse
argument.
For factor responses with more than two levels, see the dichot
argument.
object |
Model object. Typically a model which predicts a class or a class discriminant. |
hist |
|
type |
Type parameter passed to |
nresponse |
Which column to use when |
dichot |
Dichotimise the predicted response.
This argument is ignored except for models where the observed response
is a factor with more than two levels
and the predicted response is a numeric vector.
The default |
trace |
Default |
xlim |
Limits of the x axis.
The default |
ylim |
Limits of the y axis.
The default |
jitter |
Jitter the histograms or densities horizontally to minimize overplotting.
Default |
main |
Main title. Values: |
xlab |
x axis label.
Default is |
ylab |
y axis label.
Default is |
lty |
Per class line types for the plotted lines. Default is 1 (which gets recycled for all lines). |
col |
Per class line colors. The first few colors of the default are intended to be easily distinguishable on both color displays and monochrome printers. |
fill |
Fill color for the plot for the first class.
For |
breaks |
Passed to |
labels |
|
kernel |
Passed to |
adjust |
Passed to |
zero.line |
Passed to |
legend |
|
legend.names |
Class names in legend.
The default |
legend.pos |
Position of the legend.
The default |
cex.legend |
|
legend.bg |
|
legend.extra |
Show (in the legend) the number of occurrences of each class.
Default is |
vline.thresh |
Horizontal position of optional vertical line.
Default is |
vline.col |
Color of vertical line. Default is 0, meaning no vertical line. |
vline.lty |
Line type of vertical line.
Default is |
vline.lwd |
Line width of vertical line.
Default is |
err.thresh |
x axis value specifying the error shading threshold.
See |
err.col |
Specify up to three colors to shade the "error areas" of the density plot.
The default is data(etitanic) earth.mod <- earth(survived ~ ., data=etitanic) plotd(earth.mod, vline.col=1, err.col=c(2,3,4)) The three areas are (i) the error area to the left of the threshold,
(ii) the error area to the right of the threshold, and,
(iii) the reducible error area.
If less than three values are specified, |
err.border |
Borders around the error shading.
Default is |
err.lwd |
Line widths of borders of the error shading.
Default is |
xaxt |
Default is |
yaxt |
Default is |
xaxis.cex |
Only used if |
sd.thresh |
Minimum acceptable standard deviation for a density.
Default is |
... |
Extra arguments passed to the predict method for the object. |
This function calls predict
with the data originally used to build
the model, and with the type
specified above.
It then separates the predicted values into classes,
where the class for each predicted value
is determined by the class of the observed response.
Finally, it calls density
(or hist
if hist=TRUE
) for each class-specific set of values,
and plots the results.
This function estimates distributions with the
density
and hist
functions,
and also calls plot.density
and plot.histogram
.
For an overview see Venables and Ripley MASS section 5.6.
Partitioning the response into classes
Considerable effort is made to partition the predicted response
into classes in a sensible way.
This is not always possible for multiple column responses and the nresponse
argument
should be used where necessary.
The partitioning details depend on the types and numbers of columns in the observed
and predicted responses.
These in turn depend on the model object and the type
argument.
Use the trace
argument to see how plotd
partitions the
response for your model.
Degenerate densities
A message such asWarning: standard deviation of "male" density is 0, density is degenerate?
means that the density for that class will not be plotted
(the legend will say "not plotted"
).
Set sd.thresh=0
to get rid of this check,
but be aware that histograms (and sometimes x axis labels)
for degenerate densities will be misleading.
Using plotd for various models
This function is included in the earth
package
but can also be used with other models.
Example with glm
:
library(earth); data(etitanic) glm.model <- glm(sex ~ ., data=etitanic, family=binomial) plotd(glm.model)
Example with lm
:
library(earth); data(etitanic) lm.model <- lm(as.numeric(sex) ~ ., data=etitanic) plotd(lm.model)
Using plotd with lda or qda
"response"
(default) linear discriminant"ld"
same as "response"
"class"
predicted classes"posterior"
posterior probabilities
Example:
library(MASS); library(earth); data(etitanic) lda.model <- lda(sex ~ ., data=etitanic) plotd(lda.model) # linear discriminant by default plotd(lda.model, type="class", hist=TRUE, labels=TRUE)
This handling of type
is handled internally by plotd
and type
is not passed to predict.lda
(type
is used merely to select fields in the list
returned by predict.lda
).
The type names can be abbreviated down to a single character.
For objects created with lda.matrix
(as opposed to lda.formula
),
plotd
blindly assumes that the grouping
argument was the second argument.
plotd
does not yet support objects created with lda.data.frame
.
For lda
responses with more than two factor levels,
use the nresponse
argument to
select a column in the predicted response.
Thus with the default type=NULL
,
(which gets automatically converted by plotd
to type="response"
),
use nresponse=1
to select just the first linear discriminant.
The default nresponse=NULL
selects all columns,
which is typically not what you want for lda
models.
Example:
library(MASS); library(earth); set.seed(1) # optional, for reproducibility example(lda) # creates a model called "z" plot(z, dimen=1) # invokes plot.lda from the MASS package plotd(z, nresponse=1, hist=1) # equivalent using plotd # nresponse=1 selects first linear discr.
The dichot=TRUE
argument is also useful for lda
responses with more than two factor levels.
TODO
Handle degenerate densities in a more useful way.
Add freq
argument for hist
.
if (require(earth)) { old.par <- par(no.readonly=TRUE); par(mfrow=c(2,2), mar=c(4, 3, 1.7, 0.5), mgp=c(1.6, 0.6, 0), par(cex = 0.8)) data(etitanic) mod <- earth(survived ~ ., data=etitanic, degree=2, glm=list(family=binomial)) plotd(mod) plotd(mod, hist=TRUE, legend.pos=c(.25,220)) plotd(mod, hist=TRUE, type="class", labels=TRUE, xlab="", xaxis.cex=.8) par(old.par) }
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.