# Global Alignment Kernels

Posted on April 11, 2017 by Shota Nagayama

# Global Alignment Kernels (GAK)

A method which makes distances (similarities) of so complex data points such as time-sequential data calculable.

• Actually newly define the distance in the space.
• GAK is a kernel method to calculate the distance between two time-series data points.
• Sometimes they includes stops, slows, accelerates along time.
• kernel method is a mathematically proved shortcut, in the meaning of computational effort, to calculate the distance between two points in any size of dimentions, works as long as Mercer’s Theorem is satisfied.
• differences of any direction reduces the similarity.

## Kernel method

### Kernel

Kernel $$k(x_i, x_j)$$ is an inner product of two vectors $$\phi(x_i) \cdot \phi(x_j)$$ in a higher dimentional space mapped from two vectors $$x_i, x_j$$ in a lower dimentional space.

### Positive semidefinite kernel

• $$k(x_i, x_j) = k(x_j, x_i)$$
• every eigenvalue of $$\begin{pmatrix} k(x_1 x_1) \cdots k(x_1 x_n) \\ \vdots \ddots \vdots \\ k(x_n x_1) \cdots k(x_n x_n) \end{pmatrix}$$ is larger than/equal to 0.

### Reproducing Kernel Hilbert Space (RKHS)

• $$\forall x \in input\ space\ \Omega, \exists \phi_x \in higher\ dimentional\ functional\ space\ H$$
• $$\forall f \in H, \langle f, \phi_x \rangle = f(x)$$
$$\phi_x$$ is called “reproducing kernel”

### Positive semidefinite kernel and RKHS

• Collectively, positive semidefinite kernel can define RKHS.
• An inner product can be calculated as a solution of a function.

ref (Japanese): カーネル法 正定値カーネルを用いたデータ解析, 朱鷺の杜Wiki

## Algorithm

1. stretch/warp the time-axises of the two data arbitrarily and individually
2. create a full set of “alignments”, which consists of two sequences of shifts (produced by the warping)
• an alignment is for example, assuming two time-series data $$x$$ and $$y$$ of length 5,
$$x = (0.0, 1.0, 2.0, 3.0, 3.0)$$
$$y = (0.0, 1.0, 1.0, 2.0, 3.0)$$,
and they are warped (aligned) to be the similar sequences, such as
$$x' = (0.0, 1.0, 1.0, 2.0, 3.0, 3.0)$$
$$y' = (0.0, 1.0, 1.0, 2.0, 3.0, 3.0)$$,
then the alignments of values in $$x'$$ and $$y'$$ can be described with the indices of $$x$$ and $$y$$ as $$\pi_1 = (1, 2, 2, 3, 4, 5)$$
$$\pi_2 = (1, 2, 3, 3, 4, 5)$$,
hence the alignment $$\pi$$ is
$$\pi = ((1,1), (2,2), (2,3), (3,3), (4,4), (5,5))$$.
$$(\frac{\pi_1(i+1) - \pi_1(i)}{\pi_2(i+1) - \pi_2(i)})$$ are $$(\frac{1}{1}, \frac{0}{1}, \frac{1}{0}, \frac{1}{1}, \frac{1}{1})$$.
To avoid meaningless warping, restriction $$(\frac{\pi_1(i+1) - \pi_1(i)}{\pi_2(i+1) - \pi_2(i)})\in ({\frac{0}{1}), (\frac{1}{0}), (\frac{1}{1}})$$ is given.
3. compare the aligned values to calculate the similarity
• Dynamic Time Warping(DTW): search a set which has the biggest similarity between the two sequences of alignments. DTW is a old method to compare two time-series data.
• GAK: Sum the exponentiated and sign changed similarities of every pairs.
• Actually the values excluding the time values of corresponding data points are involved in the calculation.

## Implementation

### Parameters

• Kernel bandwidth $$\sigma$$:
$sigma = 0.5*(T1+T2)/2*np.sqrt((T1+T2)/2)$ The web site says “The Bandwidth sigma can be set as a multiple of a simple estimate of the median distance of different points observed in different time-series of your training set, scaled by the square root of the median length of time-series in the training set”
I’m not sure yet. The implementatin does not seem to be the examplified explanation.
• Triangular parameters $$T$$ bounds the maximum difference of the alignments to correspond each two data points as $$-T < \pi_1(i) - \pi_2(i) < T$$. $$T=0$$ indicates no bound.

### Playing GAK

#### Modified test.py to my_test.py

import numpy as np
import global_align as ga

def test(d=1, T1=10, T2=11):
np.random.seed(1)
# generate two random time series of dimension d and duration T1, resp. T2
seq1 = np.array([[0.0], [1.0], [2.0], [3.0], [4.0], [5.0], [6.0], [7.0], [8.0], [9.0]])
seq2 = np.array([[0.0], [1.0], [2.0], [3.0], [4.0], [5.0], [6.0], [7.0], [7.0], [8.0], [9.0]])
#seq2 = np.array([[0.0], [1.0], [2.0], [3.0], [4.0], [5.0], [6.0], [7.0], [8.0], [9.0], [10.0]])
# define the sigma parameter
sigma = 0.5*(T1+T2)/2*np.sqrt((T1+T2)/2)
# compute the global alignment kernel value for different triangular parameters
diff_t = np.abs(T1-T2)
Ts = range(0, 11)
for triangular in Ts:
val = ga.tga_dissimilarity(seq1, seq2, sigma, triangular)
kval = np.exp(-val)
if 0 < triangular <= diff_t:
# for 0 < triangular <= diff_t, exp(-tga_d) == 0
assert kval == 0
print "T=%d \t exp(-tga_d)=%0.5f" % (triangular, kval)

#### Results

• Case of the length are the same, the last factor of a time-series data stop.
input:

seq1 = np.array([[0.0], [1.0], [2.0], [3.0], [4.0], [5.0], [6.0], [7.0], [8.0], [9.0], [9.0]])
seq2 = np.array([[0.0], [1.0], [2.0], [3.0], [4.0], [5.0], [6.0], [7.0], [8.0], [9.0], [10.0]])

result:

T=0 	 exp(-tga_d)=0.99701
T=1 	 exp(-tga_d)=0.99700
T=2 	 exp(-tga_d)=0.99700
T=3 	 exp(-tga_d)=0.99700
T=4 	 exp(-tga_d)=0.99701
T=5 	 exp(-tga_d)=0.99701
T=6 	 exp(-tga_d)=0.99701
T=7 	 exp(-tga_d)=0.99701
T=8 	 exp(-tga_d)=0.99701
T=9 	 exp(-tga_d)=0.99701
T=10 	 exp(-tga_d)=0.99701
• Case of the length are different, the longer one keeps change.
input:

seq1 = np.array([[0.0], [1.0], [2.0], [3.0], [4.0], [5.0], [6.0], [7.0], [8.0], [9.0]])
seq2 = np.array([[0.0], [1.0], [2.0], [3.0], [4.0], [5.0], [6.0], [7.0], [8.0], [9.0], [10.0]])

result:

T=0 	 exp(-tga_d)=0.94617
T=1 	 exp(-tga_d)=0.00000
T=2 	 exp(-tga_d)=0.70536
T=3 	 exp(-tga_d)=0.86143
T=4 	 exp(-tga_d)=0.91535
T=5 	 exp(-tga_d)=0.93685
T=6 	 exp(-tga_d)=0.94418
T=7 	 exp(-tga_d)=0.94590
T=8 	 exp(-tga_d)=0.94615
T=9 	 exp(-tga_d)=0.94617
T=10 	 exp(-tga_d)=0.94617
• Case of a time-series data is slow during the changing
input:

seq1 = np.array([[0.0], [1.0], [2.0], [3.0], [4.0], [5.0], [6.0], [7.0], [8.0], [9.0]])
seq2 = np.array([[0.0], [1.0], [2.0], [3.0], [4.0], [5.0], [6.0], [7.0], [7.0], [8.0], [9.0]])

result:

T=0 	 exp(-tga_d)=0.95889
T=1 	 exp(-tga_d)=0.00000
T=2 	 exp(-tga_d)=0.70348
T=3 	 exp(-tga_d)=0.86528
T=4 	 exp(-tga_d)=0.92346
T=5 	 exp(-tga_d)=0.94772
T=6 	 exp(-tga_d)=0.95639
T=7 	 exp(-tga_d)=0.95853
T=8 	 exp(-tga_d)=0.95886
T=9 	 exp(-tga_d)=0.95889
T=10 	 exp(-tga_d)=0.95889
• Case of two time-series data slows at different points.
input:

seq1 = np.array([[0.0], [1.0], [2.0], [3.0], [4.0], [5.0], [6.0], [7.0], [8.0], [9.0], [9.0]])
seq2 = np.array([[0.0], [1.0], [2.0], [3.0], [4.0], [5.0], [6.0], [7.0], [7.0], [8.0], [9.0]])

result:

T=0 	 exp(-tga_d)=0.99494
T=1 	 exp(-tga_d)=0.99401
T=2 	 exp(-tga_d)=0.99291
T=3 	 exp(-tga_d)=0.99401
T=4 	 exp(-tga_d)=0.99460
T=5 	 exp(-tga_d)=0.99483
T=6 	 exp(-tga_d)=0.99491
T=7 	 exp(-tga_d)=0.99493
T=8 	 exp(-tga_d)=0.99494
T=9 	 exp(-tga_d)=0.99494
T=10 	 exp(-tga_d)=0.99494

#### Consideration

• No bound and $$T=9$$ result in the same in the all examples. This is because no bound includes all alignments and $$T=9$$ allows GAK to include all possible alignments in those examples.
• $$T=1$$ does not allow shifting, hence examples of time-series data of different length result in 0 with $$T=1$$.
• In every example, the differences between $$T=2$$ and $$T=3$$ indicate the sum of the non-dominating terms here, because alignment for $$1$$ is enough to result in the best matching in those examples.