DTW Barycenter Averaging
A global averaging method for time series under DTW (Petitjean, Ketterlin and Gancarski 2011).
DBA( X, centroid = NULL, ..., window.size = NULL, norm = "L1", max.iter = 20L, delta = 0.001, error.check = TRUE, trace = FALSE, mv.ver = "by-variable" ) dba( X, centroid = NULL, ..., window.size = NULL, norm = "L1", max.iter = 20L, delta = 0.001, error.check = TRUE, trace = FALSE, mv.ver = "by-variable" )
X |
A matrix or data frame where each row is a time series, or a list where each element is a time series. Multivariate series should be provided as a list of matrices where time spans the rows and the variables span the columns of each matrix. |
centroid |
Optionally, a time series to use as reference. Defaults to a random series of |
... |
Further arguments for |
window.size |
Window constraint for the DTW calculations. |
norm |
Norm for the local cost matrix of DTW. Either "L1" for Manhattan distance or "L2" for Euclidean distance. |
max.iter |
Maximum number of iterations allowed. |
delta |
At iteration |
error.check |
Logical indicating whether the function should try to detect inconsistencies and give more informative errors messages. Also used internally to avoid repeating checks. |
trace |
If |
mv.ver |
Multivariate version to use. See below. |
This function tries to find the optimum average series between a group of time series in DTW space. Refer to the cited article for specific details on the algorithm.
If a given series reference is provided in centroid
, the algorithm should always converge to
the same result provided the elements of X
keep the same values, although their order may
change.
The windowing constraint uses a centered window. The calculations expect a value in
window.size
that represents the distance between the point considered and one of the edges
of the window. Therefore, if, for example, window.size = 10
, the warping for an
observation x_i considers the points between x_{i-10} and x_{i+10}, resulting
in 10(2) + 1 = 21
observations falling within the window.
The average time series.
Please note that running tasks in parallel does not guarantee faster computations. The overhead introduced is sometimes too large, and it's better to run tasks sequentially.
This function uses the RcppParallel
package
for parallelization. It uses all available threads by default (see
RcppParallel::defaultNumThreads()
), but this can
be changed by the user with
RcppParallel::setThreadOptions()
.
An exception to the above is when this function is called within a
foreach
parallel loop made by dtwclust. If the parallel
workers do not have the number of threads explicitly specified, this function will default to 1
thread per worker. See the parallelization vignette for more information
(browseVignettes("dtwclust")
).
This function appears to be very sensitive to numerical inaccuracies if multi-threading is used in a 32 bit installation. In such systems, consider limiting calculations to 1 thread.
There are currently 2 versions of DBA implemented for multivariate series (see examples):
If mv.ver = "by-variable"
, then each variable of each series in X
and centroid
are
extracted, and the univariate version of the algorithm is applied to each set of variables,
binding the results by column. Therefore, the DTW backtracking is different for each variable.
If mv.ver = "by-series"
, then all variables are considered at the same time, so the DTW
backtracking is computed based on each multivariate series as a whole. This version was
implemented in version 4.0.0 of dtwclust, and it is faster, but not necessarily more
correct.
The indices of the DTW alignment are obtained by calling dtw_basic()
with backtrack = TRUE
.
Petitjean F, Ketterlin A and Gancarski P (2011). “A global averaging method for dynamic time warping, with applications to clustering.” Pattern Recognition, 44(3), pp. 678 - 693. ISSN 0031-3203, doi: 10.1016/j.patcog.2010.09.013, https://www.sciencedirect.com/science/article/pii/S003132031000453X.
# Sample data data(uciCT) # Obtain an average for the first 5 time series dtw_avg <- DBA(CharTraj[1:5], CharTraj[[1]], trace = TRUE) # Plot matplot(do.call(cbind, CharTraj[1:5]), type = "l") points(dtw_avg) # Change the provided order dtw_avg2 <- DBA(CharTraj[5:1], CharTraj[[1]], trace = TRUE) # Same result? all.equal(dtw_avg, dtw_avg2) ## Not run: # ==================================================================================== # Multivariate versions # ==================================================================================== # sample centroid reference cent <- CharTrajMV[[3L]] # sample series x <- CharTrajMV[[1L]] # sample set of series X <- CharTrajMV[1L:5L] # the by-series version does something like this for each series and the centroid alignment <- dtw_basic(x, cent, backtrack = TRUE) # alignment$index1 and alginment$index2 indicate how to map x to cent (row-wise) # the by-variable version treats each variable separately alignment1 <- dtw_basic(x[,1L], cent[,1L], backtrack = TRUE) alignment2 <- dtw_basic(x[,2L], cent[,2L], backtrack = TRUE) alignment3 <- dtw_basic(x[,3L], cent[,3L], backtrack = TRUE) # effectively doing: X1 <- lapply(X, function(x) { x[,1L] }) X2 <- lapply(X, function(x) { x[,2L] }) X3 <- lapply(X, function(x) { x[,3L] }) dba1 <- dba(X1, cent[,1L]) dba2 <- dba(X2, cent[,2L]) dba3 <- dba(X3, cent[,3L]) new_cent <- cbind(dba1, dba2, dba3) # sanity check newer_cent <- dba(X, cent, mv.ver = "by-variable") all.equal(newer_cent, new_cent, check.attributes = FALSE) # ignore names ## End(Not run)
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.