Professional Documents
Culture Documents
fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/TIP.2015.2395816, IEEE Transactions on Image Processing
SUBMITTED TO IEEE TRANSACTIONS ON IMAGE PROCESSING, 2014
I. I NTRODUCTION
He popularity of digital cameras and mobile phone cameras leads to an explosive growth of digital images that
are available over the internet. How to accurately retrieve images from enormous collections of digital photos has become
an important research topic. Content-based image retrieval
(CBIR) addresses this challenge by identifying the matched
images based on their visual similarity to a query image [1].
However due to the semantic gap between the low-level visual
features used to represent images and the high-level semantic
tags used to describe image content, limited performance is
achieved by CBIR techniques [1], [2]. To address the limitation
1057-7149 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/TIP.2015.2395816, IEEE Transactions on Image Processing
2
1057-7149 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/TIP.2015.2395816, IEEE Transactions on Image Processing
FENG et al.: LEARNING TO RANK IMAGE TAGS WITH LIMITED TRAINING EXAMPLES
Training
Testing
Visual feature
Cloud
Gravel
Lake
Mountain
People
Road
Shore
Snow
Summit
Visual feature
Airplane
Bicycle
City
Desert
Fruit
Horse
Penguin
Shop
Cloud
Lake
Mountain
Range
Shore
Snow
Tag ranking in
descending order
Summit
Fig. 1.
mn
(1)
(x, y) =
j,k (x, y)
j,k=1
n
m
1
j,k (xi , yi )
n i=1
j,k=1
n
m
(
(
))
1
=
I(yji = yki ) (yji yki ) wj xi wk xi
n i=1
j,k=1
(2)
Figure 1 illustrates the basic idea of the proposed framework
for tag ranking.
A straightforward approach for tag ranking is to search for a
matrix W that minimizes the ranking error f (W ). This simple
approach is problematic and could lead to the overfitting of
training data when the number of training images is relatively
small and the number of unique tags is large. Like most
machine learning algorithms, an appropriate regularization
mechanism is needed to control the model complexity and
prevent overfitting the training data. In order to effectively
capture the correlation among different tags, we follow [48]
and assume that the linear prediction functions in W are
linearly dependent and consequentially W is a low rank
matrix, leading to the following optimization problem
min f (W )
(3)
1057-7149 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/TIP.2015.2395816, IEEE Transactions on Image Processing
4
2
R n [r(1 + d + m) (log(18s) + log n) + log(2n2 )]
Proof: We divide our analysis into two step. In the first,
we will focus on a fixed solution W , and in the second step,
we generalize to any W .
a) Step 1: We consider a fixed solution W . In order
), we use the Hoeffiding concentration
to bound f (W ) (W
inequality
Theorem 2: (Hoeffding inequality). Suppose aj < bj , j =
1, . . . , n, Xj [aj , bj ], E[Xj ] = 0, j = 1, . . . , n. Then,
}
{
n
2t2
Pr
Xj t exp n
2
j=1 (bj aj )
j=1
To use the Hoeffiding inequality, we define Xi =
) with |bj aj | R. As a result, we have,
(W, xi , yi ) (W
for a fixed W ,
}
{ n
(
)
[
]
2t2
Pr
(W, xi , yi ) (W ) t exp 2
nR
i=1
2
2t
By setting exp( nR
2 ) = , we have
n
1
t=R
log
2
)] R n log 1 }
P r{n[f (W ) (W
2
) R 2 log 2
f (W ) (W
n
) R 2 log 2
f (W ) (W
n
2
2
max f (W ) (W ) R
log
n
W N (,)
or if we redefine as
1
max
W N (,)
|N (,)| ,
)R
f (W ) (W
2
n
2
log |N (, )| + log
18s
)(1+m+d)r
L|W W | L,
L|W W | L
(5)
(6)
1057-7149 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/TIP.2015.2395816, IEEE Transactions on Image Processing
FENG et al.: LEARNING TO RANK IMAGE TAGS WITH LIMITED TRAINING EXAMPLES
W Rdm
(7)
(8)
where
i
jk
= I(yji = yki ) ((yji yki )x
i (wj wk ))
and em
j is a vector of m dimensions with all the elements
being zero except that its jth entry is 1.
The main computational challenge in implementing the
gradient descent approach for optimizing (9) aries from the
high cost in computing the singular value decomposition of
Wt . It is known [50] that when the objective function is
smooth, the gradient method can be accelerated to achieve the
optimal convergence rate of O(T 2 ). It was shown recently
[51][53] that a similar scheme can be applied to accelerate
optimization problems where the objective function consists of
a smooth part and a trace norm regularization. In this work,
we adopt the accelerated proximal gradient (APG) method
[54] for solving the optimization problem in (7). Specifically,
according to [54], we update the solution Wt by solving the
following optimization problem
Wt = arg min
W
where
1
W Wt 2F + W
2t
(10)
Wt = Wt1 t f (Wt1 )
1 + 1 + 4k2
k+1 =
,
2
(
)
k 1
Zk+1 = Wk +
(Wk Wk1 ) .
k+1
end while
Output: The optimal solution W .
1057-7149 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/TIP.2015.2395816, IEEE Transactions on Image Processing
6
750
700
1300
1200
700
600
600
550
500
1100
650
500
400
300
1000
900
800
700
600
200
450
400
Fig. 2.
500
20
40
60
80
Number of Iteration
100
120
100
20
40
60
80
100
120
Number of Iteration
140
160
400
180
20
40
60
80
100
120
Number of Iteration
140
160
180
Convergence of the objective function on image dataset Corel5K, ESPGame, and IAPRTC-12.
nt
1
Nc (i)
nt i=1 K
(11)
AR@K =
nt
1
Nc (i)
nt i=1 Ng (i)
(12)
1057-7149 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/TIP.2015.2395816, IEEE Transactions on Image Processing
FENG et al.: LEARNING TO RANK IMAGE TAGS WITH LIMITED TRAINING EXAMPLES
TABLE I
S TATISTICS FOR THE DATASETS USED IN THE EXPERIMENTS . T HE BOTTOM
No.of images
Vocabulary size
Tags per image
Image per tag
Corel5K
4,999
260
3.4/5
58.6/1,004
ESPGame
20,770
268
4.69/15
363/5,059
K
1 2rel(i) 1
Z i=1 log(1 + i)
(13)
IAPRTC-12
19,627
291
5.72/23
386/5,534
Pascal VOC2007
9,963
399
4.2/35
53/2,095
MAXIMUM .
SUNAttribute
14,340
102
15.5/37
2,183/11,878
1057-7149 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/TIP.2015.2395816, IEEE Transactions on Image Processing
8
40
35
30
25
50
JEC
TagProp
SVM
FastTag
MLR
Proposed
35
30
Average Precision@K(%)
45
Average Precision@K(%)
40
JEC
TagProp
SVM
FastTag
MLR
Proposed
25
20
15
20
10
15
JEC
TagProp
SVM
FastTag
MLR
Proposed
45
40
Average Precision@K(%)
50
35
30
25
20
15
10
K=1
K=2
K=3
K=4
Top K Annotated Tags
K=5
5
K=1
40
35
Average Recall@K(%)
Average Recall@K(%)
30
25
20
15
10
30
45
40
25
20
15
K=3
K=4
Top K Annotated Tags
K=5
K=10
35
JEC
TagProp
SVM
FastTag
MLR
Proposed
30
25
20
15
10
K=2
K=4
K=7
Top K Annotated Tags
50
JEC
TagProp
SVM
FastTag
MLR
Proposed
10
K=1
K=1
45
JEC
TagProp
SVM
FastTag
MLR
Proposed
35
Fig. 3.
K=10
40
K=4
K=7
Top K Annotated Tags
Average Recall@K(%)
10
5
K=1
K=4
K=7
Top K Annotated Tags
K=10
K=1
K=4
K=7
Top K Annotated Tags
K=10
Average precision and recall for automatic Image annotation on Corel5K, ESPGame and IAPRTC-12 datasets.
45
30
25
25
20
C+F
R+F
C+T
R+T
45
Average Precision@K(%)
35
50
C+F
R+F
C+T
R+T
30
Average Precision@K(%)
40
Average Precision@K(%)
35
C+F
R+F
C+T
R+T
40
35
30
25
15
20
20
K=1
K=2
K=3
K=4
Top K Annotated Tags
10
K=5
K=1
15
K=10
30
25
20
15
45
40
25
20
15
10
10
K=10
50
C+F
R+F
C+T
R+T
Average Recall@K(%)
35
Average Recall@K(%)
30
K=4
K=7
Top K Annotated Tags
40
C+F
R+F
C+T
R+T
35
K=1
40
Average Recall@K(%)
K=4
K=7
Top K Annotated Tags
C+F
R+F
C+T
R+T
35
30
25
20
15
10
Fig. 4.
K=1
K=2
K=3
K=4
Top K Annotated Tags
K=5
5
K=1
K=4
K=7
Top K Annotated Tags
K=10
K=1
K=4
K=7
Top K Annotated Tags
K=10
Evaluation of different loss functions and matrix regularizers for automatic image annotation.
1057-7149 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/TIP.2015.2395816, IEEE Transactions on Image Processing
FENG et al.: LEARNING TO RANK IMAGE TAGS WITH LIMITED TRAINING EXAMPLES
30
AP@5(%)
AP@5(%)
40
35
JEC
TagProp
SVM
FastTag
MLR
Proposed
35
45
JEC
TagProp
SVM
FastTag
MLR
Proposed
JEC
TagProp
SVM
FastTag
MLR
Proposed
40
25
AP@5(%)
50
45
20
35
30
30
15
25
20
Fig. 5.
10%
20%
50%
Different Percent of Training Samples
90%
10
25
10%
20%
50%
Different Percent of Training Samples
20
90%
10%
20%
50%
Different Percent of Training Samples
90%
Average precision at rank 5 (AP @5) with varied numbers of training images for datasets Corel5K, ESPGame, and IAPRTC-12.
85
90
Proposed
TW
RPTR
PRW
TagRel
SVM
Proposed
TW
RPTR
PRW
TagRel
SVM
85
Average NDCG@K
80
Average NDCG@K
75
80
75
70
70
65
10
65
10
1057-7149 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/TIP.2015.2395816, IEEE Transactions on Image Processing
10
25
20
15
10
25
20
15
10
K=1
K=4
K=7
Top K Annotated Tags
35
JEC
TagProp
SVM
FastTag
MLR
Proposed
30
Average Precision@K(%)
30
Average Precision@K(%)
35
JEC
TagProp
SVM
FastTag
MLR
Proposed
K=10
JEC
TagProp
SVM
FastTag
MLR
Proposed
30
Average Precision@K(%)
35
25
20
15
10
K=1
K=4
K=7
Top K Annotated Tags
K=10
K=1
K=4
K=7
Top K Annotated Tags
K=10
Fig. 6. Performance of automatic image annotation on the ESPGame dataset with incomplete image tags, where the number of observed tags is varied from
20%, 40% to 60%.
30
45
JEC
TagProp
SVM
FastTag
MLR
Proposed
40
35
Average Precision@K(%)
35
Average Precision@K(%)
45
JEC
TagProp
SVM
FastTag
MLR
Proposed
25
20
15
10
30
25
20
15
10
K=1
K=4
K=7
Top K Annotated Tags
30
25
20
15
K=10
35
10
JEC
TagProp
SVM
FastTag
MLR
Proposed
40
Average Precision@K(%)
40
K=1
K=4
K=7
Top K Annotated Tags
K=10
K=1
K=4
K=7
Top K Annotated Tags
K=10
Fig. 7. Performance of automatic image annotation on the IAPRTC-12 dataset with incomplete image tags, where the number of observed tags is varied
from 20%, 40% to 60%.
K=4
K=7
K=10
K=1
45
Average Recall@K
Average Precision@K
50
40
35
30
25
20
15
10
0.01
0.1
10
100
K=4
K=7
K=10
45
40
35
30
25
20
15
10
5
0
0.01
0.1
10
100
Fig. 9. Average precision and recall of the proposed method for the IAPRTC12 dataset with varied .
[1] R. Datta, D. Joshi, J. Li, and J. Wang, Image retrieval: ideas, influences,
and trends of the new age, ACM Computing Surveys, vol. 40, no. 2,
2008.
[2] J. Wu, H. Shen, Y. Li, Z. Xiao, M. Lu, and C. Wang, Learning a hybrid
similarity measure for image retrieval, Pattern Recognition, vol. 46,
no. 11, pp. 29272939, 2013.
[3] L. Wu, R. Jin, and A. Jain, Tag completion for image retrieval, IEEE
Trans. on Pattern Analysis and Machine Intelligence, vol. 35, no. 3, pp.
716727, 2013.
[4] A. Makadia, V. Pavlovic, and S. Kumar, Baselines for image annotation, International Journal of Computer Vision, vol. 90, no. 1, pp.
88105, 2010.
[5] J. Tang, R. Hong, S. Yan, and T. Chua, Image annotation by knn-sparse
graph-based label propagation over noisily-tagged web images, ACM
Trans. on Intelligent Systems and Technology, vol. 2, no. 2, pp. 116,
2011.
[6] J. Tang, S. Yan, R. Hong, G. Qi, and T. Chua, Inferring semantic
concepts from community-contributed images and noisy tags, in ACM
Int. Conf. on Multimedia, 2009, pp. 223232.
[7] M. Guillaumin, T. Mensink, J. Verbeek, and C. Schmid, Tagprop:
discriminative metric learning in nearest neighbor models for image
auto-annotation, in IEEE Int. Conf. on Computer Vision, 2009, pp. 309
316.
[8] W. Liu and D. Tao, Multiview hessian regularization for image annotation, IEEE Trans. on Image Processing, vol. 22, no. 7, pp. 26762687,
2013.
1057-7149 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/TIP.2015.2395816, IEEE Transactions on Image Processing
FENG et al.: LEARNING TO RANK IMAGE TAGS WITH LIMITED TRAINING EXAMPLES
11
building door
frame sky street
window
grandstand lawn
roof round spectator
stadium
JEC
TagProp
SVM
FastTag
MLR
mountain grass
landscape house
shrub rock wall
tussock slope snow
Proposed
mountain landscape
sky rock middle hill
desert lake man cliff
Ground Truth
JEC
MLR
Proposed
Ground Truth
TagProp
SVM
FastTag
Fig. 10. Examples of test images from both the ESPGame and IAPRTC-12 datasets with top 10 annotations generated by different methods. The correct
tags are highlighted by bold font whereas the incorrect ones are highlighted by italic font.
1057-7149 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/TIP.2015.2395816, IEEE Transactions on Image Processing
12
1057-7149 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/TIP.2015.2395816, IEEE Transactions on Image Processing
FENG et al.: LEARNING TO RANK IMAGE TAGS WITH LIMITED TRAINING EXAMPLES
1057-7149 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
13