Professional Documents
Culture Documents
VOL. 32,
NO. 1,
JANUARY 2010
INTRODUCTION
YANG ET AL.: A BOOSTING FRAMEWORK FOR VISUALITY-PRESERVING DISTANCE METRIC LEARNING AND ITS APPLICATION TO...
31
RELATED WORK
32
VOL. 32,
NO. 1,
JANUARY 2010
3.1 Preliminaries
We denote by D fx1 ; x2 ; . . . ; xn g the collection of data
points. Each x 2 Rd is a vector of d dimensions. We denote
by X x1 ; x2 ; . . . ; xn the data matrix containing the input
features of both the labeled and the unlabeled examples.
Following [22], we assume that the side information is
available in the form of labeled pairs, i.e., whether or not
two examples are in the same semantic category or not. For
convenience of discussion, below we refer to examples in
the same category as similar examples and examples in
different categories as dissimilar examples. Let the set of
labeled example pairs be denoted by
P fxi ; xj ; yi;j jxi 2 D; xj 2 D; yi;j 2 f1; 0; 1gg;
where the class label yi;j is positive (i.e., 1) when xi and xj
are similar, and negative (i.e., 1) when xi and xj are
different. yi;j is set to zero when the example pair xi ; xj is
unlabeled. Finally, we denote by dxi ; xj the distance
function between xi and xj . Our goal is to learn a distance
function that is consistent with the labeled pairs in P.
Remark 1. Note that standard labeled examples can always
be converted into a set of labeled pairs by assigning two
data points from the same category to the positive class
and two data points from different categories to the
negative class. Similar pairwise class labels are commonly employed in multiclass multimedia retrieval
applications [55], [56]. It is important to emphasize that
the reverse is typically difficult, i.e., it is usually difficult
to infer the unique category labels of examples from the
labeled pairs [57].
Remark 2. We label two images in the training set as similar
if they either match in semantic category or if they appear
visually related, as our goal is to simultaneously preserve
both the semantic relevance as well as the visual
similarity. For instance, two images could be defined to
be similar only if they belonged to the same semantic
category or similarity could be defined based on the
images visual similarity according to human perception.
YANG ET AL.: A BOOSTING FRAMEWORK FOR VISUALITY-PRESERVING DISTANCE METRIC LEARNING AND ITS APPLICATION TO...
T
X
t ft xi ft xj 2 ;
t1
n
33
n X
n X
n
X
Iyi;j 1Iyi;k 1
i1 j1 k1
F P
n
X
i;j;k1
Iyi;j 1Iyi;k 1
34
VOL. 32,
NO. 1,
JANUARY 2010
n
X
Ft P
Iyi;j 1Iyi;k 1
Lemma 3.3. For any > 0 and binary vector f 2 f1; 1gn , the
following inequality holds:
; f
exp8 1 >
exp8 1 >
f L f
f L f;
8
8
1
1
Si;j
j ; Si;j i;j i j ;
2 i;j i
2
expdt1 xi ; xk dt1 xi ; xj
i
expt ft xi ft xk ft xi ft xj :
To simplify our expression, we introduce the following
notations:
di;j dt1 xi ; xj ;
fi ft xi ;
i;j Iyi;j 1 expdi;j :
n
X
2
2
i;j i;k expt fi fk t fi fj :
12
where
i is defined as
i;j;k1
11
n
X
i;j :
13
j1
Recall that
i;j is defined as i;j Iyi;j 1 expdi;j in (8).
The detailed proof of this lemma is given in Appendix B.
Remark. Since
i;j / Iyi;j 1 (8), the similarity S depends
only on the data points from the must-link pairs
(equivalence constraints). Hence, f > L f in (11) essentially
measures the inconsistency between the binary vector f
and the must-link constraints. Similarly, f > L f in (11)
measures the inconsistency between f and the cannot-link
pairs (inequivalence constraints). Hence, the upper
bound in (11) essentially computes the overall inconsistency between the labeled pairs and the binary vector f .
i;j;k1
2
max0; f > L f f > L f
p q ;
8n max L max L 2
15
YANG ET AL.: A BOOSTING FRAMEWORK FOR VISUALITY-PRESERVING DISTANCE METRIC LEARNING AND ITS APPLICATION TO...
35
maxn f > L L f ;
f 2IR
16
and
p p
^f > L ^f ^f > L^f
b2IR
1; fi > b;
s:t: f^i
1; fi b:
max
17
u> u1
18
n
X
kxi ; xui :
i1
T
Y
1 t ;
19
t0
where
F0
n
X
i;j;k1
t
8max St
2
max L
t Lt
:
St max Lt max L
t
36
Fig. 2. Two images with the same semantic label (malignant masses in
this example) can look very different visually. In an ISADS application, it
is important for the system to retrieve examples that are both visually
and semantically similar.
VOL. 32,
NO. 1,
JANUARY 2010
APPLICATIONS
4.1
YANG ET AL.: A BOOSTING FRAMEWORK FOR VISUALITY-PRESERVING DISTANCE METRIC LEARNING AND ITS APPLICATION TO...
37
Fig. 4. Reduction of objective function and error rate over iterations (312 iterations in total). (a) Objective function. (b) Error rate errP. (c) Objective
function versus error rate errP.
4.3
38
VOL. 32,
NO. 1,
JANUARY 2010
TABLE 1
Comparison of the Classification Accuracy
4.4
YANG ET AL.: A BOOSTING FRAMEWORK FOR VISUALITY-PRESERVING DISTANCE METRIC LEARNING AND ITS APPLICATION TO...
39
40
VOL. 32,
NO. 1,
JANUARY 2010
TABLE 4
Area under ROC Curve on the ImageCLEF Data Set, Obtained by the Proposed Baseline Algorithms
YANG ET AL.: A BOOSTING FRAMEWORK FOR VISUALITY-PRESERVING DISTANCE METRIC LEARNING AND ITS APPLICATION TO...
41
APPENDIX A
PROOF OF LEMMA 3.2
To prove the inequality, we consider the following two
cases:
.
APPENDIX B
PROOF OF LEMMA 3.3
To prove the inequality in (11), we first bound expfi
fk 2 fi fj 2 by the following expression:
expfi fk 2 fi fj 2
exp2fi fk 2 exp2fi fj 2
:
2
2
1:
4
4
Hence, exp2fi fj 2 can be upper bounded as follows:
exp2fi fj 2
fi fj 2
fi fj 2
0
exp 8
4
4
fi fj 2
fi fj 2
exp8
4
4
fi fj 2
exp8 1 1:
4
TABLE 6
Area under the ROC Curve on the Corel Data Set, Obtained by the Proposed and Baseline Algorithms
42
n
X
n
exp8 1 X
i;j
i;j1
n
X
k1
!
!
Si;j fi fj 2 :
i;j1
APPENDIX C
PROOF OF THEOREM 3.4
We first denote by g; f the right-hand side of the
inequality in (11), i.e.,
g; f
Ft1 P
max0; f > L L f 2
1
p p
: 21
2
Ft P
8nFt P max L
max L
t
t
Since we choose f to maximize the f L
t Lt f , we have
2
i;k fi fj
exp8 1 >
exp8 1 >
f L f
f L f:
8
8
; f
JANUARY 2010
APPENDIX D
PROOF OF THEOREM 3.5
fj 2 fi fk 2 1
n
n
X
exp8 1 X
2
i;j
i;k fi fj
8
i;j1
k1
f > LSf
NO. 1,
i;j i;k
i;j;k1
n
X
i;j i;k expfi
i;j;k1
VOL. 32,
max0; max f > L
t Lt f max Lt Lt :
f
@g; f
exp8f > L f exp8f > L f 0:
@
We obtain the optimal by solving the above equation,
which is
1
1
max 0; log f > L f log f > L f :
16
16
1 >
1 Lt L
t 1 0:
n
max L
max L
2max L
t
t
t max Lt :
23
Finally, we can upper bound Ft P as follows:
Ft P
n
X
1
1
i;j i;k 1> St St 1 max St St :
2
2
i;j;k1
24
1 t :
Using the above inequality, we can bound FT 1 P as
follows:
exp8 1 >
exp8 1 >
f L f
f L f:
8
8
22
FT 1 P F0 P
T
Y
Ft1 P
t0
Ft P
F0 P
T
Y
1 t :
t0
ACKNOWLEDGMENTS
This work was supported by the US National Science
Foundation (NSF) under grant IIS-0643494 and by the
National Center for Research Resources (NCRRs) under
grant No. 1 UL1 RR024153. Any opinions, findings,
conclusions, or recommendations expressed in this material are those of the authors and do not necessarily reflect the
views of the NSF, NCRR, Intel, Michigan State University,
or Carnegie Mellon University.
8 f > L f f > L f 2
2
max0; f > L f f > L f
p p 2 :
8n max L max L
min g; f
0
REFERENCES
[1]
[2]
[3]
P. Croskerry, The Theory and Practice of Clinical DecisionMaking, Canadian J. Anesthesia, vol. 52, no. 6, pp. R1-R8, 2005.
L. Yang, R. Jin, R. Sukthankar, B. Zheng, L. Mummert, M.
Satyanarayanan, M. Chen, and D. Jukic, Learning Distance
Metrics for Interactive Search-Assisted Diagnosis of Mammograms, Proc. SPIE Conf. Medical Imaging, 2007.
A.W.M. Smeulders, M. Worring, S. Santini, A. Gupta, and R. Jain,
Content-Based Image Retrieval at the End of the Early Years,
IEEE Trans. Pattern Analysis Machine Intelligence, vol. 22, no. 12,
pp. 1349-1380, Dec. 2000.
YANG ET AL.: A BOOSTING FRAMEWORK FOR VISUALITY-PRESERVING DISTANCE METRIC LEARNING AND ITS APPLICATION TO...
[4]
[5]
[6]
[7]
[8]
[9]
[10]
[11]
[12]
[13]
[14]
[15]
[16]
[17]
[18]
[19]
[20]
[21]
[22]
[23]
[24]
[25]
43
[26] S.C.H. Hoi, W. Liu, M.R. Lyu, and W.-Y. Ma, Learning Distance
Metrics with Contextual Constraints for Image Retrieval, Proc.
IEEE CS Conf. Computer Vision and Pattern Recognition, 2006.
[27] M. Sugiyama, Local Fisher Discriminant Analysis for Supervised
Dimensionality Reduction, Proc. Intl Conf. Machine Learning,
2006.
[28] T.-K. Kim, S.-F. Wong, B. Stenger, J. Kittler, and R. Cipolla,
Incremental Linear Discriminant Analysis Using Sufficient
Spanning Set Approximations, Proc. IEEE Conf. Computer Vision
and Pattern Recognition, 2007.
[29] M. Schultz and T. Joachims, Learning a Distance Metric from
Relative Comparisons, Advances in Neural Information Processing
Systems, MIT Press, 2004.
[30] J. Goldberger, S. Roweis, G. Hinton, and R. Salakhutdinov,
Neighbourhood Components Analysis, Advances in Neural
Information Processing Systems, MIT Press, 2005.
[31] L. Yang, R. Jin, R. Sukthankar, and Y. Liu, An Efficient Algorithm
for Local Distance Metric Learning, Proc. Natl Conf. Artificial
Intelligence, 2006.
[32] A.B. Hillel and D. Weinshall, Learning Distance Function by
Coding Similarity, Proc. Intl Conf. Machine Learning, 2007.
[33] A. Woznica, A. Kalousis, and M. Hilario, Learning to Combine
Distances for Complex Representations, Proc. Intl Conf. Machine
Learning, 2007.
[34] W. Zhang, X. Xue, Z. Sun, Y. Guo, and H. Lu, Optimal
Dimensionality of Metric Space for Classification, Proc. Intl Conf.
Machine Learning, 2007.
[35] H. Wang, H. Zha, and H. Qin, Dirichlet Aggregation: Unsupervised Learning Towards an Optimal Metric for Proportional
Data, Proc. Intl Conf. Machine Learning, 2007.
[36] S. Zhou, B. Georgescu, D. Comaniciu, and J. Shao, BoostMotion:
Boosting a Discriminative Similarity Function for Motion Estimation, Proc. IEEE CS Conf. Computer Vision and Pattern Recognition,
2006.
[37] B. Babenko, P. Dollar, and S. Belongie, Task Specific Local Region
Matching, Proc. Intl Conf. Computer Vision, 2007.
[38] F. Li, J. Yang, and J. Wang, A Transductive Framework of
Distance Metric Learning by Spectral Dimensionality Reduction,
Proc. Intl Conf. Machine Learning, 2007.
[39] P. Dollar, V. Rabaud, and S. Belongie, Non-Isometric Manifold
Learning: Analysis and an Algorithm, Proc. Intl Conf. Machine
Learning, 2007.
[40] J.V. Davis, B. Kulis, P. Jain, S. Sra, and I.S. Dhillon, InformationTheoretic Metric Learning, Proc. Intl Conf. Machine Learning,
2007.
[41] J. Dillon, Y. Mao, G. Lebanon, and J. Zhang, Statistical
Translation, Heat Kernels, and Expected Distance, Proc. Conf.
Uncertainty in Artificial Intelligence, 2007.
[42] L. Yang, R. Jin, and R. Sukthankar, Bayesian Active Distance
Metric Learning, Proc. Conf. Uncertainty in Artificial Intelligence,
2007.
[43] L. Torresani and K. Lee, Large Margin Component Analysis,
Advances in Neural Information Processing Systems, MIT Press, 2007.
[44] K.Q. Weinberger, F. Sha, Q. Zhu, and L.K. Saul, Graph Laplacian
Regularization for Large-Scale Semidefinite Programming, Advances in Neural Information Processing Systems, MIT Press, 2007.
[45] D. Zhou, J. Huang, and B. Scholkopf, Learning with Hypergraphs: Clustering, Classification, and Embedding, Advances in
Neural Information Processing Systems, MIT Press, 2007.
[46] Z. Zhang and J. Wang, MLLE: Modified Locally Linear
Embedding Using Multiple Weights, Advances in Neural Information Processing Systems, MIT Press, 2007.
[47] A. Frome, Y. Singer, and J. Malik, Image Retrieval and
Classification Using Local Distance Functions, Advances in Neural
Information Processing Systems, MIT Press, 2007.
[48] O. Boiman and M. Irani, Similarity by Composition, Advances in
Neural Information Processing Systems, MIT Press, 2007.
[49] M. Belkin and P. Niyogi, Convergence of Laplacian Eigenmaps,
Advances in Neural Information Processing Systems, MIT Press, 2007.
[50] T. Hertz, A.B. Hillel, and D. Weinshall, Boosting Margin Based
Distance Functions for Clustering, Proc. Intl Conf. Machine
Learning, http://www.cs.huji.ac.il/daphna/code/DistBoost.zip,
2004.
[51] T. Hertz, A.B. Hillel, and D. Weinshall, Learning a Kernel
Function for Classification with Small Training Samples, Proc.
Intl Conf. Machine Learning, 2006.
44
VOL. 32,
NO. 1,
JANUARY 2010