You are on page 1of 18

|  

  


 

     
 

Dogan Kaya Berktas


berktas@cs.bilkent.edu.tr

CS 578
Natural Language Processing
Graduate Course
Computer Engineering
Bilkent University ± ^ 
Ô
   
 

u platform for movie indexing via


subtitle analysis
Π 

j |ntroduction

j Video Categorization Method

j WordNet Domains

j Conclusions - Future Work


|   
j Multimedia databases are becoming popular
j Most video classification methods are based on
visual/audio signal processing
j Text processing is more lightweight than
visual/audio processing
j High-level semantics are more closely related to
human language than to visual features
j Subtitles capture the semantics of the
corresponding video
^ Ô 
^ Ô

j Subtitles are segmented into sentences


j u Part of Speech Tagger is applied to each
sentence (
   Ô Œ

^‰

j Stop words removed based on a stop


words list
O   

j ^  algorithm to extract keywords


j TextRank :
represents the text as a graph,
u ranking algorithm based on Google¶s
PageRank
sorts vertices in decreasing rank order,
extracts the top highly ranked vertices for
further processing

^
 Mihalcea, R., Tarau, P.: TextRank: Bringing Order into Texts, in Ôroceedings of the Conference on Empirical
Methods in Natural Language Ôrocessing (EMNLÔ 2004), Barcelona, Spain, July 2004
è
j è is a semantic lexicon for the
English language. |t groups English words
into sets of synonyms called synsets,
provides short, general definitions, and
records the various semantic relations
between these synonym sets.³(en.wikipedia.org‰
è  
j hypernyms: Y is a hypernym of X if every X is a (kind of‰ Y
(canine is a hypernym of dog‰
j hyponyms: Y is a hyponym of X if every Y is a (kind of‰ X
(dog is a hyponym of canine‰
j coordinate terms: Y is a coordinate term of X if X and Y
share a hypernym (wolf is a coordinate term of dog, and
dog is a coordinate term of wolf‰
j holonym: Y is a holonym of X if X is a part of Y (building is
a holonym of window‰
j meronym: Y is a meronym of X if Y is a part of X (window
is a meronym of building‰

      
è
    
è
j Words have many possible meanings, called
senses
j u Word Sense Disambiguation (WSD‰
algorithm is needed to determine the
correct sense of each word
j WSD
is based on the lexical database WordNet

èBanerjee, S., Pedersen, T.: un udapted Lesk ulgorithm for Word Sense Disambiguation Using WordNet. |n the Proceedings of the 3rd |nternational
Conference on |ntelligent Text Processing and Computational Linguistics (C|CL|NG-02‰ Mexico City, Mexico (2002‰
è      
     
è   
j uugment WordNet with
domain labels
j u taxonomy of ~200
domain labels
j Each Synset annotated
at least one domain
label

èN domains: http://wndomains.itc.it/wordnetdomains.html
  è
  è   
è      
||
j For each video:
Extract the WordNet domains for each
keyword¶s sense
Calculate the frequency occurrence of each
domain label
Sort domain labels in decreasing order
according to their occurrence frequency
J  
J  
    è   
j For each category label:
Look up in WordNet the senses related to it
(include senses related through hypernym &
hyponym relations‰÷
Obtain the corresponding WordNet domains
Calculate the occurrence score for each domain
Sort domains in decreasing occurrence order
  è  


m  m  


  m 

  m    

  m   


m  m 
   
J     
J 
j J         è     
         è     
 
Example: èN domains of a video
animals
m   m 
 

science
 
m  m  
  è  

m  m  
  m 

  m    
  m   
m  m 
J    !  è
j Conclusions
un approach that is based only on text and uses
natural language processing techniques
No training phase (unsupervised approach‰
WordNet Domain mapping÷
j Future Work
Definition of domain knowledge more close to movie
classification (mpeg-7‰
|mproved WSD
^  "

  



 #$$ %

You might also like