Fundamentals of the KRLS Algorithm

Fundamentals of the KRLS Algorithm
2016/12/2
introduction
β€’ Kernel machines are a relatively new class of learning algorithms(2003~2004)
utilizing Mercer kernels in order to produce non-linear versions of conventional
linear supervised and unsupervised learning algorithms.
kernal methods
β€’ Low dimensions may become much easier if the data is mapped to a
high-dimensional space.
online sparsification
β€’ To avoid adding the training sample π‘₯𝑑 to the dictionary, we need to
find coefficients π‘Žπ‘‘ = (π‘Ž1 , … , π‘Žπ‘šπ‘‘βˆ’1 )𝑇 satisfying the approximate
linear dependence (ALD) condition:
where 𝜈 is the sparsity level parameter.
β€’ If 𝛿𝑑 ≀ 𝜈, πœ™(π‘₯𝑑 )can be approximated within a squared error 𝜈 by
some linear combination of dictionary instances.
β€’ And using π‘˜ π‘₯𝑖 , π‘₯𝑗 =γ€ˆπœ™(π‘₯𝑖 ),πœ™(π‘₯𝑗 )〉, ALD can write:
where
dictionary samples,
is the kernel matrix calculated with the
and
.
,
,for 𝑖, 𝑗 = 1, … , π‘šπ‘‘βˆ’1
β€’ The solution of ALD is given by
for which we have
β€’ If otherwise 𝛿𝑑 > 𝜈 , the current dictionary must be expanded by
adding π‘₯𝑑 .
Thereby,
and
.
β€’ sparsity allows the solution to be stored in memory in a compact form
and to be easily used later.
β€’ The sparser is the solution of a kernel algorithm, the less time and
memory.
kernel RLS(kernel recursive least squares)
β€’ a stream of training examples:
where (π‘₯1 , 𝑦1 ) ∈ ℝ𝑝 × β„ denotes the current input-output pair.
β€’ loss function of the KRLS algorithm:
where
and 𝑦𝑑 = (𝑦1 , … , 𝑦𝑑 )𝑇
β€’ optimal weight vector:
where 𝛼𝑑 = (𝛼1 , … , 𝛼𝑑 )𝑇
β€’ Then, the loss function of the KRLS can be rewritten as
,
𝐾𝑑 = Φ𝑑𝑇 Φ𝑑
The Kernel RLS Algorithm