Global Alignment Kernels
Global Alignment Kernels (GAK)
A method which makes distances (similarities) of so complex data points such as time-sequential data calculable.
- Actually newly define the distance in the space.
- GAK is a kernel method to calculate the distance between two time-series data points. - Sometimes they includes stops, slows, accelerates along time. - kernel method is a mathematically proved shortcut, in the meaning of computational effort, to calculate the distance between two points in any size of dimentions, works as long as Mercer’s Theorem is satisfied.
- differences of any direction reduces the similarity.
Kernel method
Kernel
Kernel $k(x_i, x_j)$ is an inner product of two vectors $\phi(x_i) \cdot \phi(x_j)$ in a higher dimentional space mapped from two vectors $x_i, x_j$ in a lower dimentional space.
Positive semidefinite kernel
- $k(x_i, x_j) = k(x_j, x_i)$
- every eigenvalue of $\begin{pmatrix} k(x_1 x_1) \cdots k(x_1 x_n) \ \vdots \ddots \vdots \ k(x_n x_1) \cdots k(x_n x_n) \end{pmatrix}$ is larger than/equal to 0.
Reproducing Kernel Hilbert Space (RKHS)
- $\forall x \in input\ space\ \Omega, \exists \phi_x \in higher\ dimentional\ functional\ space\ H$
- $\forall f \in H, \langle f, \phi_x \rangle = f(x)$
$\phi_x$ is called “reproducing kernel”
Positive semidefinite kernel and RKHS
- Collectively, positive semidefinite kernel can define RKHS.
- An inner product can be calculated as a solution of a function.
ref (Japanese): カーネル法 正定値カーネルを用いたデータ解析, 朱鷺の杜Wiki
Algorithm
- stretch/warp the time-axises of the two data arbitrarily and individually
- create a full set of “alignments”, which consists of two sequences of shifts (produced by the warping)
- an alignment is for example, assuming two time-series data $x$ and $y$ of length 5,
$x = (0.0, 1.0, 2.0, 3.0, 3.0)$
$y = (0.0, 1.0, 1.0, 2.0, 3.0)$,
and they are warped (aligned) to be the similar sequences, such as
$x’ = (0.0, 1.0, 1.0, 2.0, 3.0, 3.0)$
$y’ = (0.0, 1.0, 1.0, 2.0, 3.0, 3.0)$,
then the alignments of values in $x’$ and $y’$ can be described with the indices of $x$ and $y$ as $\pi_1 = (1, 2, 2, 3, 4, 5)$
$\pi_2 = (1, 2, 3, 3, 4, 5)$,
hence the alignment $\pi$ is
$\pi = ((1,1), (2,2), (2,3), (3,3), (4,4), (5,5))$.
$(\frac{\pi_1(i+1) - \pi_1(i)}{\pi_2(i+1) - \pi_2(i)})$ are $(\frac{1}{1}, \frac{0}{1}, \frac{1}{0}, \frac{1}{1}, \frac{1}{1})$.
To avoid meaningless warping, restriction $(\frac{\pi_1(i+1) - \pi_1(i)}{\pi_2(i+1) - \pi_2(i)})\in ({\frac{0}{1}), (\frac{1}{0}), (\frac{1}{1}})$ is given.
- an alignment is for example, assuming two time-series data $x$ and $y$ of length 5,
- compare the aligned values to calculate the similarity
- Dynamic Time Warping(DTW): search a set which has the biggest similarity between the two sequences of alignments. DTW is a old method to compare two time-series data.
- GAK: Sum the exponentiated and sign changed similarities of every pairs.
- Actually the values excluding the time values of corresponding data points are involved in the calculation.
Implementation
- http://marcocuturi.net/GA.html
- “Python wrapper TGA_python_wrapper, courtesy of Adrien Gaidon”
- Output: The larger is the more similar. - Because the output is exp(-dissimilarity)
Parameters
- Kernel bandwidth $\sigma$:
\(sigma = 0.5*(T1+T2)/2*np.sqrt((T1+T2)/2)\) The web site says “The Bandwidth sigma can be set as a multiple of a simple estimate of the median distance of different points observed in different time-series of your training set, scaled by the square root of the median length of time-series in the training set”
I’m not sure yet. The implementatin does not seem to be the examplified explanation. - Triangular parameters $T$ bounds the maximum difference of the alignments to correspond each two data points as $-T < \pi_1(i) - \pi_2(i) < T$. $T=0$ indicates no bound.
Playing GAK
Modified test.py to my_test.py
import numpy as np
import global_align as ga
def test(d=1, T1=10, T2=11):
np.random.seed(1)
# generate two random time series of dimension d and duration T1, resp. T2
seq1 = np.array([[0.0], [1.0], [2.0], [3.0], [4.0], [5.0], [6.0], [7.0], [8.0], [9.0]])
seq2 = np.array([[0.0], [1.0], [2.0], [3.0], [4.0], [5.0], [6.0], [7.0], [7.0], [8.0], [9.0]])
#seq2 = np.array([[0.0], [1.0], [2.0], [3.0], [4.0], [5.0], [6.0], [7.0], [8.0], [9.0], [10.0]])
# define the sigma parameter
sigma = 0.5*(T1+T2)/2*np.sqrt((T1+T2)/2)
# compute the global alignment kernel value for different triangular parameters
diff_t = np.abs(T1-T2)
Ts = range(0, 11)
for triangular in Ts:
val = ga.tga_dissimilarity(seq1, seq2, sigma, triangular)
kval = np.exp(-val)
if 0 < triangular <= diff_t:
# for 0 < triangular <= diff_t, exp(-tga_d) == 0
assert kval == 0
print "T=%d \t exp(-tga_d)=%0.5f" % (triangular, kval)
Results
- Case of the length are the same, the last factor of a time-series data stop.
input:seq1 = np.array([[0.0], [1.0], [2.0], [3.0], [4.0], [5.0], [6.0], [7.0], [8.0], [9.0], [9.0]]) seq2 = np.array([[0.0], [1.0], [2.0], [3.0], [4.0], [5.0], [6.0], [7.0], [8.0], [9.0], [10.0]])
result:
T=0 exp(-tga_d)=0.99701 T=1 exp(-tga_d)=0.99700 T=2 exp(-tga_d)=0.99700 T=3 exp(-tga_d)=0.99700 T=4 exp(-tga_d)=0.99701 T=5 exp(-tga_d)=0.99701 T=6 exp(-tga_d)=0.99701 T=7 exp(-tga_d)=0.99701 T=8 exp(-tga_d)=0.99701 T=9 exp(-tga_d)=0.99701 T=10 exp(-tga_d)=0.99701
- Case of the length are different, the longer one keeps change.
input:seq1 = np.array([[0.0], [1.0], [2.0], [3.0], [4.0], [5.0], [6.0], [7.0], [8.0], [9.0]]) seq2 = np.array([[0.0], [1.0], [2.0], [3.0], [4.0], [5.0], [6.0], [7.0], [8.0], [9.0], [10.0]])
result:
T=0 exp(-tga_d)=0.94617 T=1 exp(-tga_d)=0.00000 T=2 exp(-tga_d)=0.70536 T=3 exp(-tga_d)=0.86143 T=4 exp(-tga_d)=0.91535 T=5 exp(-tga_d)=0.93685 T=6 exp(-tga_d)=0.94418 T=7 exp(-tga_d)=0.94590 T=8 exp(-tga_d)=0.94615 T=9 exp(-tga_d)=0.94617 T=10 exp(-tga_d)=0.94617
- Case of a time-series data is slow during the changing
input:seq1 = np.array([[0.0], [1.0], [2.0], [3.0], [4.0], [5.0], [6.0], [7.0], [8.0], [9.0]]) seq2 = np.array([[0.0], [1.0], [2.0], [3.0], [4.0], [5.0], [6.0], [7.0], [7.0], [8.0], [9.0]])
result:
T=0 exp(-tga_d)=0.95889 T=1 exp(-tga_d)=0.00000 T=2 exp(-tga_d)=0.70348 T=3 exp(-tga_d)=0.86528 T=4 exp(-tga_d)=0.92346 T=5 exp(-tga_d)=0.94772 T=6 exp(-tga_d)=0.95639 T=7 exp(-tga_d)=0.95853 T=8 exp(-tga_d)=0.95886 T=9 exp(-tga_d)=0.95889 T=10 exp(-tga_d)=0.95889
- Case of two time-series data slows at different points.
input:seq1 = np.array([[0.0], [1.0], [2.0], [3.0], [4.0], [5.0], [6.0], [7.0], [8.0], [9.0], [9.0]]) seq2 = np.array([[0.0], [1.0], [2.0], [3.0], [4.0], [5.0], [6.0], [7.0], [7.0], [8.0], [9.0]])
result:
T=0 exp(-tga_d)=0.99494 T=1 exp(-tga_d)=0.99401 T=2 exp(-tga_d)=0.99291 T=3 exp(-tga_d)=0.99401 T=4 exp(-tga_d)=0.99460 T=5 exp(-tga_d)=0.99483 T=6 exp(-tga_d)=0.99491 T=7 exp(-tga_d)=0.99493 T=8 exp(-tga_d)=0.99494 T=9 exp(-tga_d)=0.99494 T=10 exp(-tga_d)=0.99494
Consideration
- No bound and $T=9$ result in the same in the all examples. This is because no bound includes all alignments and $T=9$ allows GAK to include all possible alignments in those examples.
- $T=1$ does not allow shifting, hence examples of time-series data of different length result in 0 with $T=1$.
- In every example, the differences between $T=2$ and $T=3$ indicate the sum of the non-dominating terms here, because alignment for $1$ is enough to result in the best matching in those examples.