Fast global alignment kernels
Distance based on (triangular) global alignment kernels.
GAK( x, y, ..., sigma = NULL, window.size = NULL, normalize = TRUE, error.check = TRUE ) gak( x, y, ..., sigma = NULL, window.size = NULL, normalize = TRUE, error.check = TRUE )
x, y |
Time series. A multivariate series should have time spanning the rows and variables spanning the columns. |
... |
Currently ignored. |
sigma |
Parameter for the Gaussian kernel's width. See details for the interpretation of
|
window.size |
Parameterization of the constraining band (T in Cuturi (2011)). See details. |
normalize |
Normalize the result by considering diagonal terms. |
error.check |
Logical indicating whether the function should try to detect inconsistencies and give more informative errors messages. Also used internally to avoid repeating checks. |
This function uses the Triangular Global Alignment Kernel (TGAK) described in Cuturi (2011). It supports series of different length and multivariate series, so long as the ratio of the series' lengths doesn't differ by more than 2 (or less than 0.5).
The window.size
parameter is similar to the one used in DTW, so NULL
signifies no constraint,
and its value should be greater than 1 if used with series of different length.
The Gaussian kernel is parameterized by sigma
. Providing NULL
means that the value will be
estimated by using the strategy mentioned in Cuturi (2011) with a constant of 1. This estimation
is subject to randomness, so consider estimating the value once and re-using it (the estimate
is returned as an attribute of the result). See the examples.
For more information, refer to the package vignette and the referenced article.
The logarithm of the GAK if normalize = FALSE
, otherwise 1 minus the normalized GAK. The value
of sigma
is assigned as an attribute of the result.
The version registered with dist
is custom (loop = FALSE
in
pr_DB
). The custom function handles multi-threaded parallelization
directly (with RcppParallel
). It uses all
available threads by default (see
RcppParallel::defaultNumThreads()
), but this can
be changed by the user with
RcppParallel::setThreadOptions()
.
An exception to the above is when it is called within a foreach
parallel loop made by dtwclust. If the parallel workers do not have the number of
threads explicitly specified, this function will default to 1 thread per worker. See the
parallelization vignette for more information (browseVignettes("dtwclust")
).
It also includes symmetric optimizations to calculate only half a distance matrix when
appropriate—only one list of series should be provided in x
. If you want to avoid this
optimization, call dist
by giving the same list of series in both x
and y
.
The estimation of sigma
does not depend on window.size
.
If normalize
is set to FALSE
, the returned value is not a distance, rather a similarity.
The proxy::dist()
version is thus always normalized. Use proxy::simil()
with method
set to
"uGAK" if you want the unnormalized similarities.
A constrained unnormalized calculation (i.e. with window.size > 0
and normalize = FALSE
) will
return negative infinity if abs(NROW(x)
-
NROW(y))
>
window.size
. Since the function
won't perform calculations in that case, it might be faster, but if this behavior is not desired,
consider reinterpolating the time series (see reinterpolate()
) or increasing the window size.
Cuturi, M. (2011). Fast global alignment kernels. In Proceedings of the 28th international conference on machine learning (ICML-11) (pp. 929-936).
## Not run: data(uciCT) set.seed(832) GAKd <- proxy::dist(zscore(CharTraj), method = "gak", pairwise = TRUE, window.size = 18L) # Obtained estimate of sigma sigma <- attr(GAKd, "sigma") # Use value for clustering tsclust(CharTraj, k = 20L, distance = "gak", centroid = "shape", trace = TRUE, args = tsclust_args(dist = list(sigma = sigma, window.size = 18L))) ## End(Not run) # Unnormalized similarities proxy::simil(CharTraj[1L:5L], method = "ugak")
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.