# Global Alignment Kernels (GAK)

A method which makes distances (similarities) of so complex data points such as time-sequential data calculable.

• Actually newly define the distance in the space.
• GAK is a kernel method to calculate the distance between two time-series data points. - Sometimes they includes stops, slows, accelerates along time. - kernel method is a mathematically proved shortcut, in the meaning of computational effort, to calculate the distance between two points in any size of dimentions, works as long as Mercer’s Theorem is satisfied.
• differences of any direction reduces the similarity.

## Kernel method

### Kernel

Kernel $k(x_i, x_j)$ is an inner product of two vectors $\phi(x_i) \cdot \phi(x_j)$ in a higher dimentional space mapped from two vectors $x_i, x_j$ in a lower dimentional space.

### Positive semidefinite kernel

• $k(x_i, x_j) = k(x_j, x_i)$
• every eigenvalue of $\begin{pmatrix} k(x_1 x_1) \cdots k(x_1 x_n) \ \vdots \ddots \vdots \ k(x_n x_1) \cdots k(x_n x_n) \end{pmatrix}$ is larger than/equal to 0.

### Reproducing Kernel Hilbert Space (RKHS)

• $\forall x \in input\ space\ \Omega, \exists \phi_x \in higher\ dimentional\ functional\ space\ H$
• $\forall f \in H, \langle f, \phi_x \rangle = f(x)$
$\phi_x$ is called “reproducing kernel”

### Positive semidefinite kernel and RKHS

• Collectively, positive semidefinite kernel can define RKHS.
• An inner product can be calculated as a solution of a function.

ref (Japanese): カーネル法 正定値カーネルを用いたデータ解析, 朱鷺の杜Wiki

## Algorithm

1. stretch/warp the time-axises of the two data arbitrarily and individually
2. create a full set of “alignments”, which consists of two sequences of shifts (produced by the warping)
• an alignment is for example, assuming two time-series data $x$ and $y$ of length 5,
$x = (0.0, 1.0, 2.0, 3.0, 3.0)$
$y = (0.0, 1.0, 1.0, 2.0, 3.0)$,
and they are warped (aligned) to be the similar sequences, such as
$x’ = (0.0, 1.0, 1.0, 2.0, 3.0, 3.0)$
$y’ = (0.0, 1.0, 1.0, 2.0, 3.0, 3.0)$,
then the alignments of values in $x’$ and $y’$ can be described with the indices of $x$ and $y$ as $\pi_1 = (1, 2, 2, 3, 4, 5)$
$\pi_2 = (1, 2, 3, 3, 4, 5)$,
hence the alignment $\pi$ is
$\pi = ((1,1), (2,2), (2,3), (3,3), (4,4), (5,5))$.
$(\frac{\pi_1(i+1) - \pi_1(i)}{\pi_2(i+1) - \pi_2(i)})$ are $(\frac{1}{1}, \frac{0}{1}, \frac{1}{0}, \frac{1}{1}, \frac{1}{1})$.
To avoid meaningless warping, restriction $(\frac{\pi_1(i+1) - \pi_1(i)}{\pi_2(i+1) - \pi_2(i)})\in ({\frac{0}{1}), (\frac{1}{0}), (\frac{1}{1}})$ is given.
3. compare the aligned values to calculate the similarity
• Dynamic Time Warping(DTW): search a set which has the biggest similarity between the two sequences of alignments. DTW is a old method to compare two time-series data.
• GAK: Sum the exponentiated and sign changed similarities of every pairs.
• Actually the values excluding the time values of corresponding data points are involved in the calculation.

## Implementation

### Parameters

• Kernel bandwidth $\sigma$:
$sigma = 0.5*(T1+T2)/2*np.sqrt((T1+T2)/2)$ The web site says “The Bandwidth sigma can be set as a multiple of a simple estimate of the median distance of different points observed in different time-series of your training set, scaled by the square root of the median length of time-series in the training set”
I’m not sure yet. The implementatin does not seem to be the examplified explanation.
• Triangular parameters $T$ bounds the maximum difference of the alignments to correspond each two data points as $-T < \pi_1(i) - \pi_2(i) < T$. $T=0$ indicates no bound.

### Playing GAK

#### Modified test.py to my_test.py

import numpy as np
import global_align as ga

def test(d=1, T1=10, T2=11):
np.random.seed(1)
# generate two random time series of dimension d and duration T1, resp. T2
seq1 = np.array([[0.0], [1.0], [2.0], [3.0], [4.0], [5.0], [6.0], [7.0], [8.0], [9.0]])
seq2 = np.array([[0.0], [1.0], [2.0], [3.0], [4.0], [5.0], [6.0], [7.0], [7.0], [8.0], [9.0]])
#seq2 = np.array([[0.0], [1.0], [2.0], [3.0], [4.0], [5.0], [6.0], [7.0], [8.0], [9.0], [10.0]])
# define the sigma parameter
sigma = 0.5*(T1+T2)/2*np.sqrt((T1+T2)/2)
# compute the global alignment kernel value for different triangular parameters
diff_t = np.abs(T1-T2)
Ts = range(0, 11)
for triangular in Ts:
val = ga.tga_dissimilarity(seq1, seq2, sigma, triangular)
kval = np.exp(-val)
if 0 < triangular <= diff_t:
# for 0 < triangular <= diff_t, exp(-tga_d) == 0
assert kval == 0
print "T=%d \t exp(-tga_d)=%0.5f" % (triangular, kval)


#### Results

• Case of the length are the same, the last factor of a time-series data stop.
input:
seq1 = np.array([[0.0], [1.0], [2.0], [3.0], [4.0], [5.0], [6.0], [7.0], [8.0], [9.0], [9.0]])
seq2 = np.array([[0.0], [1.0], [2.0], [3.0], [4.0], [5.0], [6.0], [7.0], [8.0], [9.0], [10.0]])


result:

T=0 	 exp(-tga_d)=0.99701
T=1 	 exp(-tga_d)=0.99700
T=2 	 exp(-tga_d)=0.99700
T=3 	 exp(-tga_d)=0.99700
T=4 	 exp(-tga_d)=0.99701
T=5 	 exp(-tga_d)=0.99701
T=6 	 exp(-tga_d)=0.99701
T=7 	 exp(-tga_d)=0.99701
T=8 	 exp(-tga_d)=0.99701
T=9 	 exp(-tga_d)=0.99701
T=10 	 exp(-tga_d)=0.99701

• Case of the length are different, the longer one keeps change.
input:
seq1 = np.array([[0.0], [1.0], [2.0], [3.0], [4.0], [5.0], [6.0], [7.0], [8.0], [9.0]])
seq2 = np.array([[0.0], [1.0], [2.0], [3.0], [4.0], [5.0], [6.0], [7.0], [8.0], [9.0], [10.0]])


result:

T=0 	 exp(-tga_d)=0.94617
T=1 	 exp(-tga_d)=0.00000
T=2 	 exp(-tga_d)=0.70536
T=3 	 exp(-tga_d)=0.86143
T=4 	 exp(-tga_d)=0.91535
T=5 	 exp(-tga_d)=0.93685
T=6 	 exp(-tga_d)=0.94418
T=7 	 exp(-tga_d)=0.94590
T=8 	 exp(-tga_d)=0.94615
T=9 	 exp(-tga_d)=0.94617
T=10 	 exp(-tga_d)=0.94617

• Case of a time-series data is slow during the changing
input:
seq1 = np.array([[0.0], [1.0], [2.0], [3.0], [4.0], [5.0], [6.0], [7.0], [8.0], [9.0]])
seq2 = np.array([[0.0], [1.0], [2.0], [3.0], [4.0], [5.0], [6.0], [7.0], [7.0], [8.0], [9.0]])


result:

T=0 	 exp(-tga_d)=0.95889
T=1 	 exp(-tga_d)=0.00000
T=2 	 exp(-tga_d)=0.70348
T=3 	 exp(-tga_d)=0.86528
T=4 	 exp(-tga_d)=0.92346
T=5 	 exp(-tga_d)=0.94772
T=6 	 exp(-tga_d)=0.95639
T=7 	 exp(-tga_d)=0.95853
T=8 	 exp(-tga_d)=0.95886
T=9 	 exp(-tga_d)=0.95889
T=10 	 exp(-tga_d)=0.95889

• Case of two time-series data slows at different points.
input:
seq1 = np.array([[0.0], [1.0], [2.0], [3.0], [4.0], [5.0], [6.0], [7.0], [8.0], [9.0], [9.0]])
seq2 = np.array([[0.0], [1.0], [2.0], [3.0], [4.0], [5.0], [6.0], [7.0], [7.0], [8.0], [9.0]])


result:

T=0 	 exp(-tga_d)=0.99494
T=1 	 exp(-tga_d)=0.99401
T=2 	 exp(-tga_d)=0.99291
T=3 	 exp(-tga_d)=0.99401
T=4 	 exp(-tga_d)=0.99460
T=5 	 exp(-tga_d)=0.99483
T=6 	 exp(-tga_d)=0.99491
T=7 	 exp(-tga_d)=0.99493
T=8 	 exp(-tga_d)=0.99494
T=9 	 exp(-tga_d)=0.99494
T=10 	 exp(-tga_d)=0.99494


#### Consideration

• No bound and $T=9$ result in the same in the all examples. This is because no bound includes all alignments and $T=9$ allows GAK to include all possible alignments in those examples.
• $T=1$ does not allow shifting, hence examples of time-series data of different length result in 0 with $T=1$.
• In every example, the differences between $T=2$ and $T=3$ indicate the sum of the non-dominating terms here, because alignment for $1$ is enough to result in the best matching in those examples.