Global Alignment Kernels

Posted on April 11, 2017 by Shota Nagayama
Tags: AI, Global Alignment Kernels

Global Alignment Kernels (GAK)

A method which makes distances (similarities) of so complex data points such as time-sequential data calculable.

Kernel method

Kernel

Kernel \(k(x_i, x_j)\) is an inner product of two vectors \(\phi(x_i) \cdot \phi(x_j)\) in a higher dimentional space mapped from two vectors \(x_i, x_j\) in a lower dimentional space.

Positive semidefinite kernel

Reproducing Kernel Hilbert Space (RKHS)

Positive semidefinite kernel and RKHS

ref (Japanese): カーネル法 正定値カーネルを用いたデータ解析, 朱鷺の杜Wiki

Algorithm

  1. stretch/warp the time-axises of the two data arbitrarily and individually
  2. create a full set of “alignments”, which consists of two sequences of shifts (produced by the warping)
    • an alignment is for example, assuming two time-series data \(x\) and \(y\) of length 5,
      \(x = (0.0, 1.0, 2.0, 3.0, 3.0)\)
      \(y = (0.0, 1.0, 1.0, 2.0, 3.0)\),
      and they are warped (aligned) to be the similar sequences, such as
      \(x' = (0.0, 1.0, 1.0, 2.0, 3.0, 3.0)\)
      \(y' = (0.0, 1.0, 1.0, 2.0, 3.0, 3.0)\),
      then the alignments of values in \(x'\) and \(y'\) can be described with the indices of \(x\) and \(y\) as \(\pi_1 = (1, 2, 2, 3, 4, 5)\)
      \(\pi_2 = (1, 2, 3, 3, 4, 5)\),
      hence the alignment \(\pi\) is
      \(\pi = ((1,1), (2,2), (2,3), (3,3), (4,4), (5,5))\).
      \((\frac{\pi_1(i+1) - \pi_1(i)}{\pi_2(i+1) - \pi_2(i)})\) are \((\frac{1}{1}, \frac{0}{1}, \frac{1}{0}, \frac{1}{1}, \frac{1}{1})\).
      To avoid meaningless warping, restriction \((\frac{\pi_1(i+1) - \pi_1(i)}{\pi_2(i+1) - \pi_2(i)})\in ({\frac{0}{1}), (\frac{1}{0}), (\frac{1}{1}})\) is given.
  3. compare the aligned values to calculate the similarity
    • Dynamic Time Warping(DTW): search a set which has the biggest similarity between the two sequences of alignments. DTW is a old method to compare two time-series data.
    • GAK: Sum the exponentiated and sign changed similarities of every pairs.
    • Actually the values excluding the time values of corresponding data points are involved in the calculation.

Implementation

Parameters

Playing GAK

Modified test.py to my_test.py

import numpy as np
import global_align as ga

def test(d=1, T1=10, T2=11):
  np.random.seed(1)
  # generate two random time series of dimension d and duration T1, resp. T2
  seq1 = np.array([[0.0], [1.0], [2.0], [3.0], [4.0], [5.0], [6.0], [7.0], [8.0], [9.0]])
  seq2 = np.array([[0.0], [1.0], [2.0], [3.0], [4.0], [5.0], [6.0], [7.0], [7.0], [8.0], [9.0]])
  #seq2 = np.array([[0.0], [1.0], [2.0], [3.0], [4.0], [5.0], [6.0], [7.0], [8.0], [9.0], [10.0]])
  # define the sigma parameter
  sigma = 0.5*(T1+T2)/2*np.sqrt((T1+T2)/2)
  # compute the global alignment kernel value for different triangular parameters
  diff_t = np.abs(T1-T2)
  Ts = range(0, 11)
  for triangular in Ts:
    val = ga.tga_dissimilarity(seq1, seq2, sigma, triangular)
    kval = np.exp(-val)
    if 0 < triangular <= diff_t:
      # for 0 < triangular <= diff_t, exp(-tga_d) == 0
      assert kval == 0
    print "T=%d \t exp(-tga_d)=%0.5f" % (triangular, kval)

Results

Consideration