LOWESS Smoother with Prior Weights
This function generalizes the original LOWESS smoother (locally-weighted regression) to incorporate prior weights while preserving the original algorithm design and efficiency as closely as possible.
weightedLowess(x, y, weights = NULL, delta = NULL, npts = 200, span = 0.3, iterations = 4, output.style = "loess")
x |
a numeric vector of values for the covariate or x-axis coordinates. |
y |
a numeric vector of response values or y-axis coordinates, of same length as |
weights |
a numeric vector containing non-negative prior weights, of same length as |
delta |
a numeric scalar specifying the maximum distance between successive anchor x-values where a local regression will be computed.
Roughly corresponds to |
npts |
an integer scalar specifying the approximate number of anchor x-values at which local regressions will be computed.
Ignored if |
span |
a numeric scalar between 0 and 1 specifying the width of the smoothing window as a proportion of the total weight. |
iterations |
an integer scalar specifying the number of iterations.
|
output.style |
character string indicating whether the output should be in the style of |
This function extends the LOWESS algorithm of Cleveland (1979, 1981) to handle non-negative prior weights.
The LOWESS method consists of computing a series of local linear regressions, with each local regression restricted to a window of x-values. Smoothness is achieved by using overlapping windows and by gradually down-weighting points in each regression according to their distance from the anchor point of the window (tri-cube weighting).
To conserve running time and memory, locally-weighted regressions are computed at only a limited number of anchor x-values, either npts
or the number of distinct x-values, whichever is smaller.
Anchor points are defined exactly as in the original LOWESS algorithm.
Any x-value within distance delta
of an anchor point is considered adjacent to it.
The first anchor point is min(x)
.
With the x-values sorted in ascending order, successive anchor points are defined as follows.
The next anchor point is the smallest x-value not adjacent to any previous anchor points.
The last anchor point is max(x)
.
For each anchor point, a weighted linear regression is performed for a window of neighboring points.
The neighboring points consist of the smallest set of closest neighbors such as the sum of weights is greater than or equal to span
times the total weight of all points.
Each local regression produces a fitted value for that anchor point.
Fitted values for other x-values are then obtained by linear interpolation between anchor points.
For the first iteration, the local linear regressions use weights equal to prior weights times the tri-cube distance weights. Subsequent iterations multiple these weights by robustifying weights. Points with residuals greater than 6 times the median absolute residual are assigned weights of zero and otherwise Tukey's biweight function is applied to the residuals to obtain the robust weights. More iterations produce greater robustness.
In summary, the prior weights are used in two ways. First, the prior weights are used during the span calculations such that the points included in the window for each local regression must account for the specified proportion of the total sum of weights. Second, the weights used for the local regressions are the product of the prior weights, tri-cube local weights and biweight robustifying weights. Hence a point with prior weight equal to an integer n has the same influence as n points with unit weight and the same x and y-values.
See also loessFit
, which is is essentially a wrapper function for lowess
and weightedLowess
with added error checking.
Relationship to lowess and loess
The stats package provides two functions lowess
and loess
.
lowess
implements the original LOWESS algorithm of Cleveland (1979, 1981) designed for scatterplot smoothing with single x-variable while loess
implements the more complex algorithm by Cleveland et al (1988, 1992) designed to fit multivariate surfaces.
The loess
algorithm is more general than lowess
in a number of ways, notably because it allows prior weights and up to four numeric predictors.
On the other hand, loess
is necessarily slower and uses more memory than lowess
.
Furthermore, it has less accurate interpolation than lowess
because it uses a cruder algorithm to choose the anchor points whereby anchor points are equi-spaced in terms of numbers of points rather than in terms of x-value spacing.
lowess
and loess
also have different defaults and input parameters.
See Smyth (2003) for a detailed discussion.
Another difference between lowess
and loess
is that lowess
returns the x and y coordinates of the fitted curve, with x in ascending order, whereas loess
returns fitted values and residuals in the original data order.
The purpose of the current function is to incorporate prior weights but keep the algorithmic advantages of the original lowess
code for scatterplot smoothing.
The current function therefore generalizes the span
and interpolation concepts of lowess
differently to loess
.
When output.style="loess"
, weightedLowess
outputs results in original order similar to loessFit
and loess
.
When output.style="lowess"
, weightedLowess
outputs results in sorted order the same as lowess
.
The span
argument corresponds to the f
argument of lowess
and the span
argument of loess
.
The delta
argument is the same as the delta
argument of lowess
.
The npts
argument is new and amounts to a more convenient way to specify delta
.
The iterations
argument is the same as the corresponding argument of loess
and is equivalent to iter+1
where iter
is the lowess
argument.
If output.style="loess"
, then a list with the following components:
fitted |
numeric vector of smoothed y-values (in the same order as the input vectors). |
residuals |
numeric vector or residuals. |
weights |
numeric vector of robustifying weights used in the most recent iteration. |
delta |
the delta used, either the input value or the value derived from |
If output.style="lowess"
, then a list with the following components:
x |
numeric vector of x-values in ascending order. |
y |
numeric vector or smoothed y-values. |
delta |
the delta used, either the input value or the value derived from |
C code and R function by Aaron Lun.
Cleveland, W.S. (1979). Robust Locally Weighted Regression and Smoothing Scatterplots. Journal of the American Statistical Association 74(368), 829-836.
Cleveland, W.S. (1981). LOWESS: A program for smoothing scatterplots by robust locally weighted regression. The American Statistician 35(1), 54.
Cleveland, W.S., and Devlin, S.J. (1988). Locally-weighted regression: an approach to regression analysis by local fitting. Journal of the American Statistical Association 83(403), 596-610.
Cleveland, W.S., Grosse, E., and Shyu, W.M. (1992). Local regression models. Chapter 8 In: Statistical Models in S edited by J.M. Chambers and T.J. Hastie, Chapman & Hall/CRC, Boca Raton.
Smyth, G.K. 2003. lowess vs. loess. Answer on the Bioconductor Support forum https://support.bioconductor.org/p/2323/.
y <- rt(100,df=4) x <- runif(100) w <- runif(100) l <- weightedLowess(x, y, w, span=0.7, output.style="lowess") plot(x, y, cex=w) lines(l, col = "red")
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.