Global Alignment Kernels (GAK)

A method which makes distances (similarities) of so complex data points such as time-sequential data calculable.

Actually newly define the distance in the space.
GAK is a kernel method to calculate the distance between two time-series data points. - Sometimes they includes stops, slows, accelerates along time. - kernel method is a mathematically proved shortcut, in the meaning of computational effort, to calculate the distance between two points in any size of dimentions, works as long as Mercer’s Theorem is satisfied.
differences of any direction reduces the similarity.

Kernel method

Kernel

Kernel $k(x_i, x_j)$ is an inner product of two vectors $\phi(x_i) \cdot \phi(x_j)$ in a higher dimentional space mapped from two vectors $x_i, x_j$ in a lower dimentional space.

Positive semidefinite kernel

$k(x_i, x_j) = k(x_j, x_i)$
every eigenvalue of $\begin{pmatrix} k(x_1 x_1) \cdots k(x_1 x_n) \ \vdots \ddots \vdots \ k(x_n x_1) \cdots k(x_n x_n) \end{pmatrix}$ is larger than/equal to 0.

Reproducing Kernel Hilbert Space (RKHS)

$\forall x \in input\ space\ \Omega, \exists \phi_x \in higher\ dimentional\ functional\ space\ H$
$\forall f \in H, \langle f, \phi_x \rangle = f(x)$
$\phi_x$ is called “reproducing kernel”

Positive semidefinite kernel and RKHS

Collectively, positive semidefinite kernel can define RKHS.
An inner product can be calculated as a solution of a function.

ref (Japanese): カーネル法正定値カーネルを用いたデータ解析, 朱鷺の杜Wiki

Algorithm

stretch/warp the time-axises of the two data arbitrarily and individually
create a full set of “alignments”, which consists of two sequences of shifts (produced by the warping)
- an alignment is for example, assuming two time-series data $x$ and $y$ of length 5,
  $x = (0.0, 1.0, 2.0, 3.0, 3.0)$
  $y = (0.0, 1.0, 1.0, 2.0, 3.0)$,
  and they are warped (aligned) to be the similar sequences, such as
  $x’ = (0.0, 1.0, 1.0, 2.0, 3.0, 3.0)$
  $y’ = (0.0, 1.0, 1.0, 2.0, 3.0, 3.0)$,
  then the alignments of values in $x’$ and $y’$ can be described with the indices of $x$ and $y$ as $\pi_1 = (1, 2, 2, 3, 4, 5)$
  $\pi_2 = (1, 2, 3, 3, 4, 5)$,
  hence the alignment $\pi$ is
  $\pi = ((1,1), (2,2), (2,3), (3,3), (4,4), (5,5))$.
  $(\frac{\pi_1(i+1) - \pi_1(i)}{\pi_2(i+1) - \pi_2(i)})$ are $(\frac{1}{1}, \frac{0}{1}, \frac{1}{0}, \frac{1}{1}, \frac{1}{1})$.
  To avoid meaningless warping, restriction $(\frac{\pi_1(i+1) - \pi_1(i)}{\pi_2(i+1) - \pi_2(i)})\in ({\frac{0}{1}), (\frac{1}{0}), (\frac{1}{1}})$ is given.
compare the aligned values to calculate the similarity
- Dynamic Time Warping(DTW): search a set which has the biggest similarity between the two sequences of alignments. DTW is a old method to compare two time-series data.
- GAK: Sum the exponentiated and sign changed similarities of every pairs.
- Actually the values excluding the time values of corresponding data points are involved in the calculation.

Implementation

http://marcocuturi.net/GA.html
“Python wrapper TGA_python_wrapper, courtesy of Adrien Gaidon”
Output: The larger is the more similar. - Because the output is exp(-dissimilarity)

Parameters

Kernel bandwidth $\sigma$:
$sigma = 0.5*(T1+T2)/2*np.sqrt((T1+T2)/2)$ The web site says “The Bandwidth sigma can be set as a multiple of a simple estimate of the median distance of different points observed in different time-series of your training set, scaled by the square root of the median length of time-series in the training set”
I’m not sure yet. The implementatin does not seem to be the examplified explanation.
Triangular parameters $T$ bounds the maximum difference of the alignments to correspond each two data points as $-T < \pi_1(i) - \pi_2(i) < T$. $T=0$ indicates no bound.

Playing GAK

Modified test.py to my_test.py

import numpy as np
import global_align as ga

def test(d=1, T1=10, T2=11):
  np.random.seed(1)
  # generate two random time series of dimension d and duration T1, resp. T2
  seq1 = np.array([[0.0], [1.0], [2.0], [3.0], [4.0], [5.0], [6.0], [7.0], [8.0], [9.0]])
  seq2 = np.array([[0.0], [1.0], [2.0], [3.0], [4.0], [5.0], [6.0], [7.0], [7.0], [8.0], [9.0]])
  #seq2 = np.array([[0.0], [1.0], [2.0], [3.0], [4.0], [5.0], [6.0], [7.0], [8.0], [9.0], [10.0]])
  # define the sigma parameter
  sigma = 0.5*(T1+T2)/2*np.sqrt((T1+T2)/2)
  # compute the global alignment kernel value for different triangular parameters
  diff_t = np.abs(T1-T2)
  Ts = range(0, 11)
  for triangular in Ts:
    val = ga.tga_dissimilarity(seq1, seq2, sigma, triangular)
    kval = np.exp(-val)
    if 0 < triangular <= diff_t:
      # for 0 < triangular <= diff_t, exp(-tga_d) == 0
      assert kval == 0
    print "T=%d \t exp(-tga_d)=%0.5f" % (triangular, kval)

Results

Case of the length are the same, the last factor of a time-series data stop.
input:

seq1 = np.array([[0.0], [1.0], [2.0], [3.0], [4.0], [5.0], [6.0], [7.0], [8.0], [9.0], [9.0]])
seq2 = np.array([[0.0], [1.0], [2.0], [3.0], [4.0], [5.0], [6.0], [7.0], [8.0], [9.0], [10.0]])

result:

T=0 	 exp(-tga_d)=0.99701
T=1 	 exp(-tga_d)=0.99700
T=2 	 exp(-tga_d)=0.99700
T=3 	 exp(-tga_d)=0.99700
T=4 	 exp(-tga_d)=0.99701
T=5 	 exp(-tga_d)=0.99701
T=6 	 exp(-tga_d)=0.99701
T=7 	 exp(-tga_d)=0.99701
T=8 	 exp(-tga_d)=0.99701
T=9 	 exp(-tga_d)=0.99701
T=10 	 exp(-tga_d)=0.99701

Case of the length are different, the longer one keeps change.
input:

seq1 = np.array([[0.0], [1.0], [2.0], [3.0], [4.0], [5.0], [6.0], [7.0], [8.0], [9.0]])
seq2 = np.array([[0.0], [1.0], [2.0], [3.0], [4.0], [5.0], [6.0], [7.0], [8.0], [9.0], [10.0]])

result:

T=0 	 exp(-tga_d)=0.94617
T=1 	 exp(-tga_d)=0.00000
T=2 	 exp(-tga_d)=0.70536
T=3 	 exp(-tga_d)=0.86143
T=4 	 exp(-tga_d)=0.91535
T=5 	 exp(-tga_d)=0.93685
T=6 	 exp(-tga_d)=0.94418
T=7 	 exp(-tga_d)=0.94590
T=8 	 exp(-tga_d)=0.94615
T=9 	 exp(-tga_d)=0.94617
T=10 	 exp(-tga_d)=0.94617

Case of a time-series data is slow during the changing
input:

seq1 = np.array([[0.0], [1.0], [2.0], [3.0], [4.0], [5.0], [6.0], [7.0], [8.0], [9.0]])
seq2 = np.array([[0.0], [1.0], [2.0], [3.0], [4.0], [5.0], [6.0], [7.0], [7.0], [8.0], [9.0]])

result:

T=0 	 exp(-tga_d)=0.95889
T=1 	 exp(-tga_d)=0.00000
T=2 	 exp(-tga_d)=0.70348
T=3 	 exp(-tga_d)=0.86528
T=4 	 exp(-tga_d)=0.92346
T=5 	 exp(-tga_d)=0.94772
T=6 	 exp(-tga_d)=0.95639
T=7 	 exp(-tga_d)=0.95853
T=8 	 exp(-tga_d)=0.95886
T=9 	 exp(-tga_d)=0.95889
T=10 	 exp(-tga_d)=0.95889

Case of two time-series data slows at different points.
input:

seq1 = np.array([[0.0], [1.0], [2.0], [3.0], [4.0], [5.0], [6.0], [7.0], [8.0], [9.0], [9.0]])
seq2 = np.array([[0.0], [1.0], [2.0], [3.0], [4.0], [5.0], [6.0], [7.0], [7.0], [8.0], [9.0]])

result:

T=0 	 exp(-tga_d)=0.99494
T=1 	 exp(-tga_d)=0.99401
T=2 	 exp(-tga_d)=0.99291
T=3 	 exp(-tga_d)=0.99401
T=4 	 exp(-tga_d)=0.99460
T=5 	 exp(-tga_d)=0.99483
T=6 	 exp(-tga_d)=0.99491
T=7 	 exp(-tga_d)=0.99493
T=8 	 exp(-tga_d)=0.99494
T=9 	 exp(-tga_d)=0.99494
T=10 	 exp(-tga_d)=0.99494

Consideration

No bound and $T=9$ result in the same in the all examples. This is because no bound includes all alignments and $T=9$ allows GAK to include all possible alignments in those examples.
$T=1$ does not allow shifting, hence examples of time-series data of different length result in 0 with $T=1$.
In every example, the differences between $T=2$ and $T=3$ indicate the sum of the non-dominating terms here, because alignment for $1$ is enough to result in the best matching in those examples.