Spectra Distance/Similarity Measurements
These functions provide different normalized similariy/distance measurements.
ndotproduct(x, y, m = 0L, n = 0.5, na.rm = TRUE, ...) dotproduct(x, y, m = 0L, n = 0.5, na.rm = TRUE, ...) neuclidean(x, y, m = 0L, n = 0.5, na.rm = TRUE, ...) navdist(x, y, m = 0L, n = 0.5, na.rm = TRUE, ...) nspectraangle(x, y, m = 0L, n = 0.5, na.rm = TRUE, ...)
x |
|
y |
|
m |
|
n |
|
na.rm |
|
... |
ignored. |
All functions that calculate normalized similarity/distance measurements are prefixed with a n.
ndotproduct
: the normalized dot product is described in Stein and Scott
1994 as: NDP = \frac{∑(W_1 W_2)^2}{∑(W_1)^2 ∑(W_2)^2}; where
W_i = x^m * y^n, where x and y are the m/z and intensity
values, respectively. Stein and Scott 1994 empirically determined the optimal
exponents as m = 3
and n = 0.6
by analyzing ca. 12000 EI-MS data of
8000 organic compounds in the NIST Mass Spectral Library.
MassBank (Horai et al. 2010) uses m = 2
and n = 0.5
for small compounds. In general with increasing values for m
,
high m/z values will be taken more into account for similarity calculation.
Especially when working with small molecules, a value n > 0
can be set
to give a weight on the m/z values to accommodate that shared fragments
with higher m/z are less likely and will mean that molecules might be more
similar. Increasing n
will result in a higher importance of the intensity
values. Most commonly m = 0
and n = 0.5
are used.
neuclidean
: the normalized euclidean distance is described in Stein and
Scott 1994 as:
NED = (1 + \frac{∑((W_1 - W_2)^2)}{sum((W_2)^2)})^{-1}; where
W_i = x^m * y^n, where x and y are the m/z and intensity
values, respectively. See the details section about ndotproduct
for an
explanation how to set m
and n
.
navdist
: the normalized absolute values distance is described in Stein and
Scott 1994 as:
NED = (1 + \frac{∑(|W_1 - W_2|)}{sum((W_2))})^{-1}; where
W_i = x^m * y^n, where x and y are the m/z and intensity
values, respectively. See the details section about ndotproduct
for an
explanation how to set m
and n
.
nspectraangle
: the normalized spectra angle is described in Toprak et al
2014 as:
NSA = 1 - \frac{2*\cos^{-1}(W_1 \cdot W_2)}{π}; where
W_i = x^m * y^n, where x and y are the m/z and intensity
values, respectively. The weighting was not originally proposed by Toprak et
al. 2014. See the details section about ndotproduct
for an explanation how
to set m
and n
.
double(1)
value between 0:1
, where 0
is completely different
and 1
identically.
These methods are implemented as described in Stein and Scott 1994
(navdist
, ndotproduct
, neuclidean
) and Toprak et al. 2014
(nspectraangle
) but because there is no reference implementation available
we are unable to guarantee that the results are identical.
Please see also the corresponding discussion at the github pull request
linked below. If you find any problems or reference implementation please
open an issue at
https://github.com/rformassspectrometry/MsCoreUtils/issues.
navdist
, neuclidean
, nspectraangle
: Sebastian Gibb
ndotproduct
: Sebastian Gibb and
Thomas Naake, thomasnaake@googlemail.com
Stein, S. E., and Scott, D. R. (1994). Optimization and testing of mass spectral library search algorithms for compound identification. Journal of the American Society for Mass Spectrometry, 5(9), 859–866. doi: 10.1016/1044-0305(94)87009-8.
Horai et al. (2010). MassBank: a public repository for sharing mass spectral data for life sciences. Journal of mass spectrometry, 45(7), 703–714. doi: 10.1002/jms.1777.
Toprak et al. (2014). Conserved peptide fragmentation as a benchmarking tool for mass spectrometers and a discriminating feature for targeted proteomics. Molecular & Cellular Proteomics : MCP, 13(8), 2056–2071. doi: 10.1074/mcp.O113.036475.
Pull Request for these distance/similarity measurements: https://github.com/rformassspectrometry/MsCoreUtils/pull/33
x <- matrix(c(1:5, 1:5), ncol = 2, dimnames = list(c(), c("mz", "intensity"))) y <- matrix(c(1:5, 5:1), ncol = 2, dimnames = list(c(), c("mz", "intensity"))) ndotproduct(x, y) ndotproduct(x, y, m = 2, n = 0.5) ndotproduct(x, y, m = 3, n = 0.6) neuclidean(x, y) navdist(x, y) nspectraangle(x, y)
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.