Professional Documents
Culture Documents
- ALOK TALEKAR
- SAIRAM SUNDARESAN
11/23/2010
: Presentation structure :
1.
4. Mathematical background:
1.
2.
3.
Trees
K-Means
Modifications to K-Means
7. Conclusion.
11/23/2010
11/23/2010
11/23/2010
11/23/2010
11/23/2010
11/23/2010
11/23/2010
11/23/2010
10
11/23/2010
11
11/23/2010
12
11/23/2010
13
11/23/2010
14
11/23/2010
15
11/23/2010
16
Problem
Variations in :
Scale
Illuminations
Rotation
Noise : different sources for images.
17
Image Recognition
11/23/2010
18
Mathematics
Voronoi diagram and vector quantization
Trees : kd trees, vocabulary trees
K-Means clustering
Modifications of the k-Means.
11/23/2010
19
Voronoi diagram
After Georgy Voroni
Let P be a set of n distinct points (sites)
in the plane.
The Voronoi diagram of P is the
subdivision of the plane into n cells, one
for each site.
A point q lies in the cell corresponding to
a site pi P iff
Euclidean_Distance( q, pi ) <
Euclidean_distance( q, pj ), for each pi P,
j i.
11/23/2010
20
Voronoi diagram
choose measure of distance.
partition space.
21
Clustering
Iterative.
11/23/2010
22
11/23/2010
23
K-Means Clustering
Initialization: Given K categories, N points in feature
space. Pick K points randomly; these are initial cluster
centers (means) m1, ,mK. Repeat the following:
24
11/23/2010
25
11/23/2010
26
Complexity
NP-Hard. Using huristics, quick to converge.
No guarantee that it will converge to the global optimum,
and the result may depend on the initial clusters.
As the algorithm is usually very fast, it is common to run it
multiple times with different starting conditions.
Similarity to EM, EM maintains probabilistic assignments to
clusters, instead of deterministic assignments, and
multivariate Gaussian distributions instead of means.
11/23/2010
27
Why Trees?
Quick to understand and implement.
Computational benefits:
Most algorithms ( when implemented correctly) give a O(n log(n))
performance.
Speedup due to dividing the search space at each stage.
11/23/2010
28
29
Demo
http://www.visualthesaurus.com/
11/23/2010
30
Describing an Image
33
Describing an Image
34
Definition of Scoring
Number of the descriptor
vectors of each image
with a path along the
node i (ni query, mi
database)
Number of images in the
database with at least one
descriptor vector path
through the node i (Ni )
Ni=2
m_Img1=1
m_Img2=1
Ni=1
m_Img1=2
m_Img2=0
Slide credit Tatiana Tommasi - Scalable Recognition with a Vocabulary Tree
35
Definition of Scoring
Weights are assigned to each node (tf idf)
36
Implementation of Scoring
Every node is associated with
an inverted file.
Inverted files stored the idnumbers of the images in
which a particular node
occurs and the term
frequency of that image.
decrease the fraction of
images in the database that
have to be explicitely
considered for a query.
Img2, 1
37
Img1, 1
Img1, 2
Query
38
Database
Ground truth database 6376 images
Groups of four images of the same
object but under different conditions
Each image in turn is used as query
image and the three remaining images
from its group should be at the top of
the query results
39
Modifications to k-Means
Earlier approaches require additional O(K 2) space for a speedup,
which in general is impractical if clustering in the range of 1M.
11/23/2010
40
AKM
Uses kd-trees.
Gives almost equivalent performance in smaller scales and out
performances at higher scale.
Saves time by using elements of randomness in choosing the
nearest neighbor between points and mean, rather than the exact
nearest neighbor.
11/23/2010
41
11/23/2010
42
43
44
45
Applications
47
48
11/23/2010
49
11/23/2010
50
Tabulated Results
11/23/2010
51
11/23/2010
52
Data Organization
Server groups images by location
Data Reduction
Server clusters, prunes and compresses descriptors
System Timing
Low latency image matching on the cell phone
11/23/2010
53
Video of Results
11/23/2010
54
System Overview
Since recognizing all the different locations just visually is a very
complicated vision task, the authors simplify and use rough
location as a cue.
Phone gets position from GPS satellite (or other means, its good
enough to know the rough location within a city block or two)
Phone prefetches data relevant to current position
Then the user snaps a photo
the image is then matched with the prefetched data
And the result is displayed to the user
11/23/2010
55
System Overview
GPS
Server
Image Matching
How does the matching work ? And whats in the pre-fetched data?
For example, in the next slide that Ill show you, there are some
database images taken from the users location
The prefetched data contains features from these images. features
are extracted from the query image and matched against features
in our database of features.
The next step is to run geometric consistency (RANSAC) checks to
eliminate false points.
Finally, the image with the highest number of point matches is
chosen.
11/23/2010
57
Image Matching
Query Image
AffineTest
SURF-64
Ratio
Descriptors
Matching
RANSAC
Prefetched Data
Database Images
11/23/2010
59
Data Organization
Kernel
Loxel
61
Extract
Features
Cluster
Features
Geometric
Consistency Check
Match
Images
Prune
Features
Compress
Descriptors
Loxel-Based
Feature
Store
Extract
Features
Compute
Feature Matches
Camera
Image
Geometric
Consistency Check
Display Info for
Top Ranked Image
Feature
Cache
ANN
Server
Network
Geo-Tagged
Images
Device
Location
meta-feature
Example Feature Cluster
single feature
4x Reduction
Budget:
500
200
100
All
Feature Compression
Quantization
Entropy coding
Huffman tables
12 different tables
64-dimensional SURF
256 bytes uncompressed
37 bytes with compression
S dx
S dy
S|dx|
S|dy|
Original
Feature
Compressed
Feature
Quantization
Entropy
Coding
Matching Results
Query
Rank 1
Rank 2
Rank 3
Rank 4
Matching Performance
True Matches
False Matches
Timing Analysis
Nokia N95
332 MHz ARM
64 MB RAM
Upload
Upload
Geometric
Consistency
Feature
Matching
Extract
Features
Extract
Features
Conclusions
The END
Questions?
11/23/2010
72