Professional Documents
Culture Documents
Yutaka Sasaki
NaCTeM
School of Computer Science
Correct
codes
Top 5
Candidates
Significance
of each
feature
• Text Classification
– No query, interactions, external information
– Decide topics of documents 7
©2008 Yutaka Sasaki, University of Manchester
Examples of relevant technologies
web documents
web documents
web documents
web documents
Date: 04/12/03
Place: London
Type: traffic
Casualty: 5
web documents
about accidents
sports
economics
web documents
… science
2.Create a classifier (model) classifier
life
d1 Y N 1
Y Y 1
d2
d3 N Y 1
d4 N N 1
d5 Y N 1
d1 N Y 1
… … …
… …
Y 1
d10 N
N N
d11- … … 990
d1000 N N
d1 Y N 1
Y Y 1
d2
d3 N Y 1
d4 N N 1
d5 Y N 1
Precision = TP = 1 = 0.333
TP+FP 1+2
©2008 Yutaka Sasaki, University of Manchester 29
Issue in Precision
• When a system outputs only confident topics, the
precision easily reaches a high percentage.
system’s prediction correct answer TP FP FN TN
d1 N Y 1
… … …(Y or N)… … …
N 1
d999 N
d1000 Y Y 1
Precision = TP = 1 =1
TP+FP 1
©2008 Yutaka Sasaki, University of Manchester 30
Recall (sensitivity)
• The rate of correctly predicted topics
correct answer system’s prediction
d1 Y N 1
Y Y 1
d2
d3 N Y 1
d4 N N 1
d5 Y N 1
Recall = TP = 1 = 0.5
TP+FN 1+1
©2008 Yutaka Sasaki, University of Manchester 32
Issue in Recall
• When a system outputs loosely, the recall easily
reaches a high percentage.
system’s prediction correct answer TP FP FN TN
d1 Y Y 1
… … …(Y or N)… … …
N 1
d999 Y
d1000 Y Y 1
Recall = TP = n =1
TP+FN n
Precision + Recall
d1 Y N 1
Y Y 1
d2
d3 N Y 1
d4 N N 1
d5 Y N 1
Generally, feature vectors are very sparse, i.e., most of the values are 0.
OTA B C CTA
©2008 Yutaka Sasaki, University of Manchester 45
©2008 Yutaka Sasaki, University of Manchester 46
Case Studies
CTA
OTA
CTA
OTA