Professional Documents
Culture Documents
If the objective function is quadratic the corresponding Hes- To invert the Hessian it is sufficient to compute the Cholesky
sian is diagonal and hence is invertible in linear time . Since decomposition of K s v ,s v + AI, where K s v ,s v denotes the
many functions are well approximated by a quadratic func- subset of the kernel matrix with respect to the support vec-
tion locally it makes sense to multiply the negative gradient tors. The resulting run time complexity to compute the
by the inverse of the diagonal portion of the Hessian: Newton step is therefore O( lsvI3 ) .
Note that it is not necessary to explicitly compute the Hes- In the online setting it is possible to incrementally update
sian, since the entries of scaling matrix D are given by: the inverse Hessian required by the Newton method after
arrival of the next pattern in S by exploiting the well known
Sherman-Morrison-Woodbury formula:
l :::;i :::;n
(II)
i =n +l
The run time complexity to compute the scaled gradient step This approach has been previously applied in incremental
hence is O(n ·I svl ). SVM learning [7, 8]. According to (13) exchanging one
column and row of the inverse requires O(n Z ) multiplica- A
tions. Yet, the worst case complexity of incremental updates
is O(n 3 ) , if, for example, all the patterns in the buffer leave
the set of support vectors upon arrival of the next pattern .
Since this worst case scenario occurs frequently in practice,
.>: ......'
.'.'
OlOl
o ~E
0.1
. . ......~.J - . -.-.-.-..". _.-. _.i
. ... I
Q):.;::;
~
e
0
0
o
o
o
Cl
o .. n,
10-1
_ Newton
--Newton *al3 *Il ~ _ Scaled gradient
- . - . Gradient I c=J Gradient
........ Scaled gradient *
w 0.2 *
-c
o .
\
0.1 ' ....
w
~ 10-2
c
0
* o·
*
~ ;~
* *
~_. <tl 0 II
: ·.... •........ I . . . • •_ ."
:a
o 10 20 30 40 50 60 70
Ol 011
o· ;
~B
::!: 0
Iteration
10-3 Ol!l
'010-2 * •
* ; *
~ a ~ =~
c:
ttl [10] S. Y. N. Vishwanathan, N. N. Schr audolph, and A. J.
'6
a>
:2 cc
• Cc
B
~
BIii Iii Smola, "Step size adaptation in reproducing kernel
Hilbert space," JMLR, vol. 7, pp. 1107-1133,2006.
10-3 rlc [11] L. Cheng, S.Y.N. Vishwanath an, D. Schuurmans,
a> sttl
• Cl Cl
~ S. Wang, and T. Caelli , "Implicit online learn ing with
c: ro :E .e .e .e'=t
C\l C')
c: c- E
o;:
ttl l/l
a> kernel s," in NIPS, pp. 249-256. MIT Press, 2007.
0
ro "0 E
l/l
00;
:J
E >.
o, %'
o
c:
"N
.c ~ :J 0 ttl ttl
ttl o,
0
.J:: c,
l/l
:s [12] A. Bordes, S. Ertekin, J. Weston, and L. Bottou,
"Fas t kernel classifiers with onlin e and active learn-
Fig. 6. Average iteration time (top) and prediction accu- ing," Journal of Machine Learning Research, vol. 6,
racy (bottom) for the online algorithms NORMA, SILK and pp.I579-1619,2005.
PRIONA. [13] O. Chapelle , "Training a support vector mach ine in the
primal ," Neural Computation, vol. 19, pp. 1135-1178,
2007 .
7. REFERENCES
[14] L. Bo, L. Wang, and L. Jiao, "Recursive finite newton
[1] B. Scholkopf and A. J. Smol a, Learning with kernels, algorithm for support vector regression in the primal,"
MIT Press, 2002 , Learnin g with kernels. Neural Computation, vol. 19, pp. 1082-1096,2007.
[2] T. N. Lal, M. Schr oder, T. Hinte rberger, J. Weston, [15] N. Aronszajn, "Theory of reproducing kern els,"
M. Bogdan , N. Birb aumer, and B. Scholkopf, "Sup- Transactions of the American Mathematical Society,
port Vector Channel Selection in BCI ," IEEE Trans. vol. 68,no. 3, pp. 337--404, 1950.
Biomed. Eng., vol. 51, no. 6, pp. 1003-1010,2004.
[16] G. S. Kimeldorf and G. Wahba, "A correspondence
[3] M. J . Rasch, A. Gretton, Y. Mur ayama, W. Maass, and between bayesian estimation on stochastic processes
N. K. Logothetis, "Inferr ing spike trains from local and smoothing by splines," The Annals of Mathemati-
field potent ials," J. Neurophysiol., vol. 99, pp. 1461- cal Statistics, vol. 41, no. 2, pp. 495-502, April 1970.
1476,2008. [17] J. L. Rojo -Alvarez, M. Martinez-Ramon , M. de Prado-
Cumplido, A. Artes-Rodri guez, and A. R. Figuei ras-
[4] l. Shph igelman, Y. Singer, R. Paz, and E. Vaadia, Vidal, "Support vector method for robust ARMA sys-
"Spikernels: predicting arm movements by embed- tem identification ," IEEE Transactions on Signal Pro-
ding population spike rate patterns in inner-p roduct cessing, vol. 52, no. 1, pp. 155-164,2004.
spaces," Neu ral Computation, vol. 17, pp. 671-690,
2005 . [18] K. Crammer, J. S. Kandola , and Y. Singer, "Online
classification on a budget," in NIPS, 2003.
[5] D. Brugger, S. Butovas, M. Bogdan, C. Schw arz, and
[19] J. Weston , A. Bordes, and L. Bottou, "Online (and
W. Rosens tiel, "Direct and inverse solution for a stim-
offline) on an even tighter budget," in Proc. ofthe JOth
ulus adaptation probl em using SVR," in ESANN pro-
Int. Workshop on Artificial Intelligence and Statistics,
ceedings, Bruges, 2008 , pp. 397--402.
2005 , pp. 413--420.
[6] D. Brugger, S. Buto vas, M. Bogdan, C. Schw arz, and [20] S. S. Keerthi and D. DeCoste, "A modified finite
W. Rosenstiel, "Rea l-time adaptive microstimulation newton method for fast solut ion of large sca le linear
increases reliability of electrically evoked cortical po- SVMs," JMLR , vol. 6, pp. 341-361,2005.
tent ials," Submitted to Nature, 2009.