Professional Documents
Culture Documents
CS 578
Natural Language Processing
Graduate Course
Computer Engineering
Bilkent University ± ^
Ô
j |ntroduction
j WordNet Domains
^
Mihalcea, R., Tarau, P.: TextRank: Bringing Order into Texts, in Ôroceedings of the Conference on Empirical
Methods in Natural Language Ôrocessing (EMNLÔ 2004), Barcelona, Spain, July 2004
è
j è is a semantic lexicon for the
English language. |t groups English words
into sets of synonyms called synsets,
provides short, general definitions, and
records the various semantic relations
between these synonym sets.³(en.wikipedia.org
è
j hypernyms: Y is a hypernym of X if every X is a (kind of Y
(canine is a hypernym of dog
j hyponyms: Y is a hyponym of X if every Y is a (kind of X
(dog is a hyponym of canine
j coordinate terms: Y is a coordinate term of X if X and Y
share a hypernym (wolf is a coordinate term of dog, and
dog is a coordinate term of wolf
j holonym: Y is a holonym of X if X is a part of Y (building is
a holonym of window
j meronym: Y is a meronym of X if Y is a part of X (window
is a meronym of building
è
è
j Words have many possible meanings, called
senses
j u Word Sense Disambiguation (WSD
algorithm is needed to determine the
correct sense of each word
j WSD
is based on the lexical database WordNet
èBanerjee, S., Pedersen, T.: un udapted Lesk ulgorithm for Word Sense Disambiguation Using WordNet. |n the Proceedings of the 3rd |nternational
Conference on |ntelligent Text Processing and Computational Linguistics (C|CL|NG-02 Mexico City, Mexico (2002
è
è
j uugment WordNet with
domain labels
j u taxonomy of ~200
domain labels
j Each Synset annotated
at least one domain
label
èN domains: http://wndomains.itc.it/wordnetdomains.html
è
è
è
||
j For each video:
Extract the WordNet domains for each
keyword¶s sense
Calculate the frequency occurrence of each
domain label
Sort domain labels in decreasing order
according to their occurrence frequency
J
J
è
j For each category label:
Look up in WordNet the senses related to it
(include senses related through hypernym &
hyponym relations÷
Obtain the corresponding WordNet domains
Calculate the occurrence score for each domain
Sort domains in decreasing occurrence order
è
m
#$$%