You are on page 1of 36

A Bayesian inference theory of

attention
Sharat Chikkerur, Thomas Serre & Tomaso Poggio
CBCL, McGovern Institute for Brain Research, MIT
•Filter theory (Broadbent) •Bayesian surprise (Itti)
•Biased competition (Desimone) •Bottleneck (Tsotsos)
•Feature integration theory (Treisman)
•Guided search (Wolfe)
•Scanpath theory (Noton)

Computational Role
Attention
Biology Effects
•Contrast gain
• V1 •Response gain
• V4 •Modulation under spatial attention
• MT •Modulation under feature attention
• LIP
• FEF
•Pop-out
•Serial vs. Parallel
•Bottom-up vs. Top-down
Role of attention
Invariant recognition: mixed blessing
 Large pooling in the higher regions leads to invariance

(Kreiman G., C. Hung, T. Poggio and J. DiCarlo,


SUNS 06)
Limitation of feed-forward model
IT V4 Psychophysics

Zoccolan Kouh Poggio DiCarlo 2007 Reynolds Chelazzi &


Serre Oliva Poggio 2007
Desimone 1999
Feedforward vs. attentive-processing

Attention is needed to recognize objects under clutter


A theoretical framework
Perception as Bayesian inference
• Mumford and Lee, “Hierarchical Bayesian Inference in the Visual
Cortex”, JOSA, 20(7), 2003 xIT
• Recurrent feed-forward/feedback loops integrate bottom up
information with top down priors

• Bottom-up signals : Data dependent xV4


• Top-down signals : Task dependent

• Top down signals provide context information and help to


disambiguate bottom-up signals xV2

xV1

x0
Attention as Bayesian inference
(MIT, Chikkerur, Serre, Poggio)

PFC

LIP/FEF
IT

V4

V2
Desimone ,MIT (unpublished)

Spatial attention:
Feature-based What isWhere
attention: at location L? O?
is object
Model description
“Where” “What”

L Fi

Fil

N Feature-maps
Feature-maps

Image
Model properties: invariance
“Where” “What”

L Fi

Fil

N
Model properties: crowding
“Where” “What”

L Fi

Fil

N
Model: spatial attention

L Fi
X

Fil

* * *
N

What is at location X?
Model: feature-based attention
X
L Fi

Fil

* * * N

Where is object X?
Effects of attention
Spatial Invariance
Spatial Attention
Feature Attention
Feature Popout
Parallel vs. Serial Search
Recognition under clutter: Feature+Spatial
Spatial attention
Model
McAdams and Maunsell ‘99
Unattended
Attended

0 20 40 60 80 100 120 140 160 180


Feature-based attention
Bichot and Desimone ‘05 Model

P stim/P cue
NP stim/ P.cue
P.stim/NP cue
NP stim/NP cue

0 0.1 0.2 0.3 0.4 0.5


Contrast gain vs. Response gain
Trujillo and Treue ‘02 Mc Adams and Maunsell’99
Psychophysics
(joint work with Cheston Tan)
Model can predict human eye-movements

Top-down
Bottom-up
spatialattention
and feature attention
Method Method ROC area
ROC(Cars) ROC area (Pedestrian)
area (absolute)
Bruce and Tsotos ’06 42.3%
Itti et al. ’01 72.8% 42.3%
Torralba et al. Itti et al ’01 78.9% 72.7% 77.1%
Proposed Proposed 80.4% 77.9% 80.1%
Humans 87.8% 87.4%
Recognition performance improves with attention

Chikkerur, Serre, Tan & Poggio (in prep)


Relation to prior work
Thank you
Examples
Examples
Quantitative evaluation: ROC
Quantitative evaluation: ROC
Integrating (local) feature-based + (global) context-based
cues accounts for 92% of inter-subject agreement!
1

0.75
ROC area

0.5

0.25

0
car pedestrian

Humans Bottom-up
Top-down (feature-based) Feaure-based + contextual cues

Chikkerur ,Tan Serre & Poggio (SFN ‘09,VSS ‘09)


Effect of clutter on detection

recognition without attention

recognition under attention


Scale and location prediction
Performance improves under
attention
3

performance (d’)
1

one shift of
no attention
attention

Model Humans

Tan, Chikkerur , Serre & Poggio (VSS ‘09)

You might also like