Enhanced Scatterplots with Marginal Boxplots, Point Marking, Smoothers, and More
This function uses basic R graphics to draw a two-dimensional scatterplot, with options to allow for plot enhancements that are often helpful with regression problems. Enhancements include adding marginal boxplots, estimated mean and variance functions using either parametric or nonparametric methods, point identification, jittering, setting characteristics of points and lines like color, size and symbol, marking points and fitting lines conditional on a grouping variable, and other enhancements.
sp
is an abbreviation for scatterplot
.
scatterplot(x, ...) ## S3 method for class 'formula' scatterplot(formula, data, subset, xlab, ylab, id=FALSE, legend=TRUE, ...) ## Default S3 method: scatterplot(x, y, boxplots=if (by.groups) "" else "xy", regLine=TRUE, legend=TRUE, id=FALSE, ellipse=FALSE, grid=TRUE, smooth=TRUE, groups, by.groups=!missing(groups), xlab=deparse(substitute(x)), ylab=deparse(substitute(y)), log="", jitter=list(), cex=par("cex"), col=carPalette()[-1], pch=1:n.groups, reset.par=TRUE, ...) sp(x, ...)
x |
vector of horizontal coordinates (or first argument of generic function). |
y |
vector of vertical coordinates. |
formula |
a model formula, of the form |
data |
data frame within which to evaluate the formula. |
subset |
expression defining a subset of observations. |
boxplots |
if |
regLine |
controls adding a fitted regression line to the plot. if
|
legend |
when the plot is drawn by groups and |
id |
controls point identification; if |
ellipse |
controls plotting data-concentration ellipses. If |
grid |
If TRUE, the default, a light-gray background grid is put on the graph |
smooth |
specifies a nonparametric estimate of the mean or median
function of the vertical axis variable given the
horizontal axis variable and optionally a nonparametric estimate of the conditional variance. If
|
groups |
a factor or other variable dividing the data into groups; groups are plotted with different colors, plotting characters, fits, and smooths. Using this argument is equivalent to specifying the grouping variable in the formula. |
by.groups |
if |
xlab |
label for horizontal axis. |
ylab |
label for vertical axis. |
log |
same as the |
jitter |
a list with elements |
col |
with no grouping, this specifies a color for plotted points;
with grouping, this argument should be a vector
of colors of length at least equal to the number of groups. The default is
value returned by |
pch |
plotting characters for points; default is the plotting characters in
order (see |
cex |
sets the size of plotting characters, with |
reset.par |
if |
... |
other arguments passed down and to |
Many arguments to scatterplot
were changed in version 3 of car to simplify use of
this function.
The smooth
argument is usually either set to TRUE
or FALSE
to draw, or omit,
the smoother. Alternatively smooth
can be set to a list of arguments. The default behavior of
smooth=TRUE
is equivalent to smooth=list(smoother=loessLine, var=!by.groups, lty.var=2, lty.var=4)
, specifying the smoother to be used, including the variance smooth,
and the line widths and types for the curves. You can also specify the colors you want to use for the mean and variance smooths with the arguments col.smooth
and col.var
. Alternative smoothers are gamline
which uses the
gam
function from the mgcv package, and quantregLine
which uses quantile regression to
estimate the median and quartile functions using rqss
from the quantreg package. All of these
smoothers have one or more arguments described on their help pages, and these arguments can be added to the
smooth
argument; for example, smooth = list(span=1/2)
would use the default
loessLine
smoother,
include the variance smooth, and change the value of the smoothing parameter to 1/2. For loessLine
and gamLine
the variance smooth is estimated by separately
smoothing the squared positive and negative
residuals from the mean smooth, using the same type of smoother. The displayed curves are equal to
the mean smooth plus the square root of the fit to the positive squared residuals, and the mean fit minus
the square root of the smooth of the negative squared residuals. The lines therefore represent the
comnditional variabiliity at each value on the horizontal axis. Because smoothing is done separately for
positive and negative residuals, the variation shown will generally not be symmetric about the fitted mean
function. For the quantregLine
method, the center estimates the median for each value on the
horizontal axis, and the variability estimates the lower and upper quartiles of the estimated conditional
distribution for each value of the horizontal axis.
The sub-arguments spread
, lty.spread
and col.spread
of the smooth
argument are equivalent to the newer var
, col.var
and lty.var
, respectively, recognizing that the spread is a measuure of conditional variability.
If points are identified, their labels are returned; otherwise NULL
is returned invisibly.
John Fox jfox@mcmaster.ca
Fox, J. and Weisberg, S. (2019) An R Companion to Applied Regression, Third Edition, Sage.
scatterplot(prestige ~ income, data=Prestige, ellipse=TRUE) scatterplot(prestige ~ income, data=Prestige, smooth=list(smoother=quantregLine)) # use quantile regression for median and quartile fits scatterplot(prestige ~ income | type, data=Prestige, smooth=list(smoother=quantregLine, var=TRUE, span=1, lwd=4, lwd.var=2)) scatterplot(prestige ~ income | type, data=Prestige, legend=list(coords="topleft")) scatterplot(vocabulary ~ education, jitter=list(x=1, y=1), data=Vocab, smooth=FALSE, lwd=3) scatterplot(infantMortality ~ ppgdp, log="xy", data=UN, id=list(n=5)) scatterplot(income ~ type, data=Prestige) ## Not run: # remember to exit from point-identification mode scatterplot(infantMortality ~ ppgdp, id=list(method="identify"), data=UN) ## End(Not run)
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.