Professional Documents
Culture Documents
Tokyo, 11/12/2014
11/12/2014, Tokyo
/25
TQI: recency
Source: https://www.google.co.uk/search?q=google+stock+price
11/12/2014, Tokyo
/25
TQI: future
Source: https://www.google.co.uk/search?q=weather+forecast+manchester
11/12/2014, Tokyo
/25
TQI: past
Source: https://www.google.co.uk/search?q=who+was+eliminated+on+dancing+with+the+stars
11/12/2014, Tokyo
/25
TQI: atemporal
Source: https://www.google.co.uk/search?q=who+was+eliminated+on+dancing+with+the+stars
11/12/2014, Tokyo
/25
the data
training set
80 instances +
test
300 instances
proposed approach
data-driven rather than rule-based
low-sparsity attributes
external resources:
NLTK
extraction system
[1] G. H. Dias, M. Hasanuzzaman, S. Ferrari, and Y. Mathet. TempoWordNet for sentence time tagging. In
Proceedings of the 23rd International Conference on World Wide Web Companion, pages 833838, Republic
and Canton of Geneva, Switzerland, 2014.
11/12/2014, Tokyo
/25
1
ManTIME
usage
[1] M. Filannino, G. Brown, and G. Nenadic. ManTIME: Temporal expression identification and
normalization in the TempEval-3 challenge. In Proceedings of SemEval 2013, pages 5357,
Atlanta, USA, June 2013. ACL.
11/12/2014, Tokyo
10 /25
trigger classes
Feature selection RELIEF algorithm
BOW representation
4 dictionaries (1 per class)
PAST
RECENCY
FUTURE
ATEMPORAL
ancient
days
death
did
history
last
months
actual
cost
costs
current
daily
day
direction
agenda
calendar
chance
coming
dates
forecast
forthcoming
chords
lyrics
21 triggers
44 triggers
2 triggers
27 triggers
11/12/2014, Tokyo
11
/25
attributes
#
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
Attribute description
Is it a Wikipedia page title?
Does it contain a temporal expression?
Submissions term
Submissions trimester
Timing
Most frequent trigger class
Wh type
Most frequent TempoWordNet class
Most frequent POS tag tense
Most frequent coarse-grained POS tag
Trigger classes footprint
Temporal between submission and query
Tenses footprint
Ordered TempoWordNet classes
Most frequent fine-grained POS tag
Coarse-grained POS tag ordered footprint
Fine-grained POS tag ordered footprint
Coarse-grained POS tag footprint
Fine-grained POS tag footprint
Sparsity
2
2
3
4
4
5
5
5
7
8
11
16
18
18
21
119
202
204
265
Example
Input (query/time) attribute value
New York Times YES
june 2013 movies YES
Feb 28, 2013 GMT+0 B
Aug 26, 2013 GMT+0 M2
Movies 2012, Feb 28, 2013 past
peso dollar exchange rate present
how did hitler die how
current stock prices present
what is stop kony 2012 VBZ
kony 2012 fake N
what was I thinking lyrics past-atemporal
fathers day 2010, Feb 28, 2013 36.0
when does fall start VBZ-VB
the last song past-future-presentkony 2012 fake NN
when is labour day N-W-V
when is labour day NN-WRB-VBZ
when is labour day W-V-N-N
when is labour day WRB-VBZ-NN-NN
11/12/2014, Tokyo
12 /25
run 1: minimal
#
classifier:
SVM with polynomial
kernel
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
Sparsity
Attribute description
Is it a Wikipedia page title?
Does it contain a temporal expression?
Submissions term
Submissions trimester
Timing
Most frequent trigger class
Wh type
Most frequent TempoWordNet class
Most frequent POS tag tense
Most frequent coarse-grained POS tag
Trigger classes footprint
Temporal between submission and query
Tenses footprint
Ordered TempoWordNet classes
Most frequent fine-grained POS tag
Coarse-grained POS tag ordered footprint
Fine-grained POS tag ordered footprint
Coarse-grained POS tag footprint
Fine-grained POS tag footprint
11/12/2014, Tokyo
2
2
3
4
4
5
5
5
7
8
11
16
18
18
21
119
202
204
265
13 /25
run 2: intermediate
#
classifier:
SVM with polynomial
kernel
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
Sparsity
Attribute description
Is it a Wikipedia page title?
Does it contain a temporal expression?
Submissions term
Submissions trimester
Timing
Most frequent trigger class
Wh type
Most frequent TempoWordNet class
Most frequent POS tag tense
Most frequent coarse-grained POS tag
Trigger classes footprint
Temporal between submission and
Tenses footprint
Ordered TempoWordNet classes
Most frequent fine-grained POS tag
Coarse-grained POS tag ordered footprint
Fine-grained POS tag ordered footprint
Coarse-grained POS tag footprint
Fine-grained POS tag footprint
11/12/2014, Tokyo
2
2
3
4
4
5
5
5
7
8
11
16
18
18
21
119
202
204
265
14 /25
run 3: full
#
classifier:
Random Forests
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
Sparsity
Attribute description
Is it a Wikipedia page title?
Does it contain a temporal expression?
Submissions term
Submissions trimester
Timing
Most frequent trigger class
Wh type
Most frequent TempoWordNet class
Most frequent POS tag tense
Most frequent coarse-grained POS tag
Trigger classes footprint
Temporal between submission and
Tenses footprint
Ordered TempoWordNet classes
Most frequent fine-grained POS tag
Coarse-grained POS tag ordered footprint
Fine-grained POS tag ordered footprint
Coarse-grained POS tag footprint
Fine-grained POS tag footprint
11/12/2014, Tokyo
2
2
3
4
4
5
5
5
7
8
11
16
18
18
21
119
202
204
265
15 /25
Accuracy
75
50
25
55.00%
61.33%
66.33%
Intermediate
Minimal
0
Full
11/12/2014, Tokyo
16 /25
results: 5 x 10 cross-fold v.
100
Accuracy
75
50
73.18%
77.95%
78.33%
Full
Intermediate
Minimal
25
11/12/2014, Tokyo
17 /25
a posteriori fix
100
Accuracy
75
50
25
55.00%
61.33%
66.33%
Intermediate
Minimal
72.33%
0
Full
Minimal
fixed
11/12/2014, Tokyo
18 /25
confusion matrix
Classified as
Recency
Past
Future
Atemporal
43
21
11
60
Future
38
35
Atemporal
61
Recency
Past
minimal run
11/12/2014, Tokyo
20 /25
confusion matrix
Classified as
Recency
Past
Future
Atemporal
43
21
11
60
Future
38
35
Atemporal
61
Recency
Past
minimal run
11/12/2014, Tokyo
21 /25
dicult queries
iPhone 5 release date
season 2 dexter
11/12/2014, Tokyo
22 /25
dicult queries
iPhone 5 release date
season 2 dexter
11/12/2014, Tokyo
23 /25
temporal footprint
a continuous period on the time-line that temporally
defines the existence of a particular concept.
Source: Filannino, M., Nenadic G. Mining temporal footprints from Wikipedia. Proceedings of
the First AHA!-Workshop on Information Discovery in Text. (COLING 2014) (Dublin, Ireland,
August 2014), ACL.
11/12/2014, Tokyo
24 /25
online material
Source: http://www.cs.man.ac.uk/~filannim/projects/temporalia/
11/12/2014, Tokyo
25 /25
Thank
you
S
N
O
I
T
S
E
QU
Contact:
filannim@cs.man.ac.uk
Machine
Learning
Natural Language
Processing
Statistics
Semi-structured
data
Text
Mining
Linguistics
Parallel
computing
11/12/2014, Tokyo
28 /25
the task
Temporal aspects of events provide a natural
mechanism for organising information
/25
11/12/2014, Tokyo
30 /25
example
11/12/2014, Tokyo
31 /25
11/12/2014, Tokyo
32 /25
example: events
class: REPORTING
class: OCCURRENCE
11/12/2014, Tokyo
33 /25
example: links
is included
is included
11/12/2014, Tokyo
34 /25
11/12/2014, Tokyo
35 /25
visual representation
released,
saying
2020
surge
11/12/2014, Tokyo
36 /25
TempEval-3 results
Identification
Research group
Normalisation
accuracy
Overall
score
Prec.
Rec.
F1
0.93
0.88
0.9
0.86
0.776
US Naval Academy
0.89
0.91
0.9
0.79
0.71
0.95
0.85
0.9
0.77
0.69
Stanford University
0.89
0.91
0.9
0.75
0.674
0.98
0.75
0.85
0.77
0.656
0.94
0.87
0.9
0.72
0.647
Jadavpur University
0.93
0.8
0.86
0.74
0.638
0.93
0.76
0.84
0.75
0.63
0.9
0.8
0.85
0.68
0.582
Rule-based
Machine learning-based
11/12/2014, Tokyo
37 /25
model selection
93 features, 4 models:
Source: Filannino, M., and Nenadic G. ManTIME: Temporal expression extraction with
systematic feature type selection and a posteriori label adjustment. Journal of Information
processing and Management: Special Issue on Time and Information Retrieval, (2014),
Elsevier. (under review)
*5x10-fold cross validation
11/12/2014, Tokyo
38 /25
temporal footprint
Source: Filannino, M., Nenadic G. Mining temporal footprints from Wikipedia. Proceedings of
the First AHA!-Workshop on Information Discovery in Text. (COLING 2014) (Dublin, Ireland,
August 2014), ACL.
11/12/2014, Tokyo
40 /25
evaluation
subjects: people
lived from 1000 AD to 2014
41 /25
results
Galileo Galilei (1564-1642), prediction: 1556-1654
Error: 0.204
11/12/2014, Tokyo
42 /25
results
Computer (1940-today), prediction: 1882-1982
Source: http://www.cs.man.ac.uk/~filannim/projects/temporal_footprints/
11/12/2014, Tokyo
43 /25
application?
Source: http://start.csail.mit.edu/answer.php?query=
11/12/2014, Tokyo
44 /25
07/02/2011
General
Tests
Treatments
Problems
admission
transfer
SBP ~80
Ativan 2mg
BAL 383
Saline bolus 2l
hypotensive
discharge
SBP stable
08/02/2011
stable
Source: Kovaevi, A., Dehghan, A., Filannino, M., Keane, J. A., and Nenadic, G. Combining
rules and machine learning for extraction of temporal expressions and events from clinical
narratives. Journal of American Medical Informatics (2013).
hands tremor
improved
11/12/2014, Tokyo
45 /25
clinical data
disease progression
modelling
11/12/2014, Tokyo
46 /25
st
1
year backup
presentation: NTCIR-11 Temporalia
identification techniques
ACE-2004 dev & eval
TempEval Task#15
TempEval-3 Task#1
(TERN2004 corpus)
(in SemEval07)
(in SemEval13)
TempEval-2 Task#13
TimeML
(in SemEval10)
(standard)
2000
2001
2002
2003
2004
2005
2006
2007
2008
2009
2010
2011
2012
2013
TimeBank
SVM
(corpus)
(machine learning)
(machine learning)
(rule-based)
(machine learning)
(machine learning)
11/12/2014, Tokyo
48 /25
scientific interest
temporal expressions AND clinical
70
63
61
56
49
49
46
42
46
43
38
35
28
25
21
18
14
7
0
10
2000
16
15
2003
2004
12
2001
2002
2005
2006
2007
2008
2009
2010
11/12/2014, Tokyo
2011
49 /25
50 /25
ISO-TimeML
DATE
[YYYY-MM-DD]
TIME
[date]T[hh:mm:ss]
SET
P[[n][Y/M/D/w/h/m/s]]
DURATION
R[n][set]
11/12/2014, Tokyo
51 /25
temporal forms
time or date references
durations
recurring times
context-dependent
times
vague references
times indicated by an
event
11/12/2014, Tokyo
52 /25
temporal binding
fully-qualified: no reference to any other temporal
entity
11/12/2014, Tokyo
53 /25
NorMA architecture
design, implement and evaluate a novel:
identification architecture
normalisation architecture
54 /25
identification architecture
normalisation architecture
55 /25
identification architecture
normalisation architecture
56 /25
temporal expression
type
ISO-8601 representation
(value)
rule name
11/12/2014, Tokyo
57 /25
identification architecture
normalisation architecture
58 /25
identification architecture
normalisation architecture
59 /25
11/12/2014, Tokyo
60 /25
example: identification
Admission Date :
02/01/2002
Discharge Date :
02/08/2002
Unisys must
about ILLNESS
$100 million
HISTORY
OF pay
PRESENT
: in interest every quarter, on
top of $27
million
dividendswoman
on preferred
Saujule
Study
is a in
77-year-old
with astock.
history of obesity and
hypertension who presents with increased shortness of breath x 5
days. Her shortness of breath has been progressive over the last
2-3 years. On admission , she was diuresed with Lasix and was
negative 1-2 liters per day for several days.
11/12/2014, Tokyo
61 /25
example: normalisation
11/12/2014, Tokyo
62 /25
nd
2
year backup
presentation: NTCIR-11 Temporalia
Factor graph:
11/12/2014, Tokyo
64 /25
factor graph
w-2
w-1
w0
w+1
w+2
w+3
11/12/2014, Tokyo
65 /25
10800
9600
8400
7200
6000
4800
3600
2400
1200
phon_length
phon_last_phoneme
phon_form
phon_first_phoneme
lex_pnp
lex_is_space
lex_chunk
temp_year
temp_weekday
temp_time
temp_temporal_prepositions
temp_temporal_conjunctives
temp_temporal_co-reference
temp_temporal_adverbs
temp_temporal_adjectives
temp_signal
temp_season
temp_present_ref
temp_pod
temp_period
temp_past_ref
temp_ordinal
temp_number
temp_month
temp_modifier
temp_literal_number
temp_fuzzy_quantifier
temp_future_ref
temp_festivity
temp_digit
temp_compound
temp_cardinal
lex_unusual
lex_last_s
lex_is_upper
lex_is_title
lex_is_numeric
lex_is_lower
lex_is_digit
lex_is_decimal
lex_is_alpha
lex_is_alnum
lex_is_all_digits_and_dots
lex_is_all_caps_and_dots
lex_has_symbols
lex_has_digit
lex_first_upper
gaz_stopword
gaz_male_names
gaz_festivities
gaz_female_names
TIMEX3 (class)
lex_polarity
gaz_uscities
gaz_nationalities
gaz_iso_countries
gaz_countries
lex_tense
lex_token_with_no_letters_and_numbers
lex_treetagger_pos
lex_pattern
lex_vocal_pattern
lex_extended_pattern
lex_token_with_no_letters
lex_sux
lex_lancaster_stem
lex_prefix
lex_treetagger_lemma
lex_porter_stem
lex_lemma
_word_preprocessed
_word
66 /25
11/12/2014, Tokyo
Post-processing analysis
11/12/2014, Tokyo
67 /25
Temporal
Galileo Galilei
(1564-1642)
ManTIME
wikipedia pages
using dates only
Dante
(1265-1321)
gaussian fit
to be improved
11/12/2014, Tokyo
68 /25
ManTIME architecture
presentation: NTCIR-11 Temporalia
4 models
model selection
11/12/2014, Tokyo
70 /25
That means Unisys must pay about $100 million in interest every
quarter, on top of $27 million in dividends on preferred stock.
11/12/2014, Tokyo
71 /25
TempEval-3
temporal information
extraction
challenge
#
#
annotation
Corpus
purpose
training
TimeBank
documents
TempEval-3 silver
TempEval-3 eval
words
source
183
61.418
experts
training
2.452
666.309
systems
training
20
6.375
experts
testing
Source: TempEval-3 challenge; Corpora released in October 2012 (except the eval).
11/12/2014, Tokyo
72 /25
identification post-processing
CRFs
PCM
BIO
fixer
TbLS
BIO
fixer
11/12/2014, Tokyo
73 /25
processing)
Overall
score
Prec
Rec
F1
Prec
Rec
F1
Type
Value
investigate
the
dierences
between
general
and
clinical
Human&Silver (no)
0.79
0.64
0.7
0.97
0.79
0.87
0.89
0.77
0.672
Human&Silver (yes)
0.8
0.66
0.72
0.97
0.8
0.88
0.87
0.76
0.667
Human (no)
0.76
0.64
0.7
0.95
0.8
0.87
0.87
0.77
0.675
0.78
0.63
0.7
0.97
0.8
0.87
0.89
0.77
0.672
0.66
0.73
0.98
0.79
0.88
0.91
0.78
0.683
domain
use of0.74
the proposed
other0.69
0.79 the 0.7
0.95
0.85 framework
0.9
0.86 to 0.77
investigate
Human (yes)
Silver (no)
domains0.82
Silver (yes)
11/12/2014, Tokyo
74 /25
strict matching
lenient matching
Normalisation
accuracy
Overall
score
Prec
Rec
F1
Prec
Rec
F1
Type
Value
HeidelTime
0.84
0.79
0.81
0.93
0.88
0.9
0.91
0.86
0.776
NavyTime
0.79
0.8
0.8
0.89
0.91
0.9
0.89
0.79
0.71
ManTIME
0.79
0.7
0.74
0.95
0.85
0.9
0.86
0.77
0.69
SUTime
0.79
0.8
0.8
0.89
0.91
0.9
0.89
0.75
0.674
ATT
0.91
0.7
0.79
0.98
0.75
0.85
0.91
0.77
0.656
ClearTK
0.86
0.8
0.83
0.94
0.87
0.9
0.93
0.72
0.647
JU-CSE
0.82
0.7
0.75
0.93
0.8
0.86
0.87
0.74
0.638
KUL
0.77
0.63
0.69
0.93
0.76
0.84
0.89
0.75
0.63
FSS-TimEx
0.52
0.46
0.49
0.9
0.8
0.85
0.81
0.68
0.582
Source: Naushad UzZaman, Hector Llorens, Leon Derczynski, James Allen, Marc Verhagen,
and James Pustejovsky. Semeval-2013 task 1: Tempeval-3: Evaluating time expressions,
events, and temporal relations. Proceedings of the Seventh International Workshop on
Semantic Evaluation (SemEval 2013), pages 1-9, Atlanta, Georgia, USA, June 2013. ACL.
11/12/2014, Tokyo
75 /25
processing)
Overall
score
Prec
Rec
F1
Prec
Rec
F1
Type
Value
investigate
the
dierences
between
general
and
clinical
Human&Silver (no)
0.79
0.64
0.7
0.97
0.79
0.87
0.89
0.77
0.672
Human&Silver (yes)
0.8
0.66
0.72
0.97
0.8
0.88
0.87
0.76
0.667
Human (no)
0.76
0.64
0.7
0.95
0.8
0.87
0.87
0.77
0.675
0.78
0.63
0.7
0.97
0.8
0.87
0.89
0.77
0.672
0.66
0.73
0.98
0.79
0.88
0.91
0.78
0.683
domain
use of0.74
the proposed
other0.69
0.79 the 0.7
0.95
0.85 framework
0.9
0.86 to 0.77
investigate
Human (yes)
Silver (no)
domains0.82
Silver (yes)
11/12/2014, Tokyo
76 /25
strict matching
lenient matching
Normalisation
accuracy
Overall
score
Prec
Rec
F1
Prec
Rec
F1
Type
Value
HeidelTime
0.84
0.79
0.81
0.93
0.88
0.9
0.91
0.86
0.776
NavyTime
0.79
0.8
0.8
0.89
0.91
0.9
0.89
0.79
0.71
ManTIME
0.79
0.7
0.74
0.95
0.85
0.9
0.86
0.77
0.69
SUTime
0.79
0.8
0.8
0.89
0.91
0.9
0.89
0.75
0.674
ATT
0.91
0.7
0.79
0.98
0.75
0.85
0.91
0.77
0.656
ClearTK
0.86
0.8
0.83
0.94
0.87
0.9
0.93
0.72
0.647
JU-CSE
0.82
0.7
0.75
0.93
0.8
0.86
0.87
0.74
0.638
KUL
0.77
0.63
0.69
0.93
0.76
0.84
0.89
0.75
0.63
FSS-TimEx
0.52
0.46
0.49
0.9
0.8
0.85
0.81
0.68
0.582
Source: Naushad UzZaman, Hector Llorens, Leon Derczynski, James Allen, Marc Verhagen,
and James Pustejovsky. Semeval-2013 task 1: Tempeval-3: Evaluating time expressions,
events, and temporal relations. Proceedings of the Seventh International Workshop on
Semantic Evaluation (SemEval 2013), pages 1-9, Atlanta, Georgia, USA, June 2013. ACL.
11/12/2014, Tokyo
77 /25
4 models
model selection
11/12/2014, Tokyo
78 /25
That means Unisys must pay about $100 million in interest every
quarter, on top of $27 million in dividends on preferred stock.
11/12/2014, Tokyo
79 /25
TempEval-3
temporal information
extraction
challenge
#
#
annotation
Corpus
purpose
training
TimeBank
documents
TempEval-3 silver
TempEval-3 eval
words
source
183
61.418
experts
training
2.452
666.309
systems
training
20
6.375
experts
testing
Source: TempEval-3 challenge; Corpora released in October 2012 (except the eval).
11/12/2014, Tokyo
80 /25
identification post-processing
CRFs
PCM
BIO
fixer
TbLS
BIO
fixer
11/12/2014, Tokyo
81 /25
rd
3
year backup
presentation: NTCIR-11 Temporalia
why is it challenging?
Source: Temporal Information Extraction and Shallow Temporal Reasoning, D. Roth et al. 2012
11/12/2014, Tokyo
83 /25
linguistic knowledge
1. Matt exercised(E) during his lunch break(E).
2. He stretched(E), lifted(E) weights, and ran(E).
3. He showered(E), got dressed(E) and returned(E) work.
lunch break
exercised
Source: Temporal Information Extraction and Shallow Temporal Reasoning, D. Roth et al. 2012
11/12/2014, Tokyo
84 /25
linguistic knowledge
1. Matt exercised(E) during his lunch break(E).
2. He stretched(E), lifted(E) weights, and ran(E).
3. He showered(E), got dressed(E) and returned(E) work.
lunch break
exercised
stretch, lift, run
Source: Temporal Information Extraction and Shallow Temporal Reasoning, D. Roth et al. 2012
11/12/2014, Tokyo
85 /25
linguistic knowledge
1. Matt exercised(E) during his lunch break(E).
2. He stretched(E), lifted(E) weights, and ran(E).
3. He showered(E), got dressed(E) and returned(E) work.
lunch break
exercised
Source: Temporal Information Extraction and Shallow Temporal Reasoning, D. Roth et al. 2012
11/12/2014, Tokyo
86 /25
lunch break
exercised
Source: Temporal Information Extraction and Shallow Temporal Reasoning, D. Roth et al. 2012
11/12/2014, Tokyo
87 /25
lunch break
exercised
Source: Temporal Information Extraction and Shallow Temporal Reasoning, D. Roth et al. 2012
11/12/2014, Tokyo
88 /25
lunch break
exercised
shower
dress
return
Source: Temporal Information Extraction and Shallow Temporal Reasoning, D. Roth et al. 2012
11/12/2014, Tokyo
89 /25
domain knowledge
1. Matt exercised(E) during his lunch break(E).
2. He stretched(E), lifted(E) weights, and ran(E).
3. He showered(E), got dressed(E) and returned(E) work.
lunch break
exercised
stretch
run
stretch
shower
dress
return
lift
Source: Temporal Information Extraction and Shallow Temporal Reasoning, D. Roth et al. 2012
11/12/2014, Tokyo
90 /25
Temporal footprint
presentation: NTCIR-11 Temporalia
results
E: 0.159
11/12/2014, Tokyo
92 /25
!
A
H
11/12/2014, Tokyo
93 /25
11/12/2014, Tokyo
94 /25
Temporalia
presentation: NTCIR-11 Temporalia
data
Query
Submission date
CLASS
past
Jan 1, 2013
future
Jan 1, 2013
future
present
present
Movies 2012
training
set: 100 queries
current
price of gold
Amazon Deal of the Day
atemporal
11/12/2014, Tokyo
96 /25
attributes
ID
Query
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
Submitted runs
Minimal
Intermediate
Full
11/12/2014, Tokyo
97 /25
error measure
gold
prediction
union
overlap
11/12/2014, Tokyo
98 /25