Professional Documents
Culture Documents
http://ntcir.nii.ac.jp/CrossLink
############################################################
INTRODUCTION:
can be viewed as a process of creating a virtual link between the provided cross-
lingual query and the retrieved documents; on the other hand, CLLD
use them as queries with the contextual information from the text to establish
hypertext links between documents of same language for easy reading and
2
except for the cross-lingual link between pages about the same subject. This
could pose serious difficulties for users who try to seek information or knowledge
break the language barrier in knowledge sharing. With CLLD users are allowed
to discover documents in languages which they either are familiar with (or not), or
For English there are several link discovery tools, which assist topic curators in
tools yet exist, that support the cross linking of documents from multiple
languages. This task aims to incubate the technologies assisting CLLD and
manner. The language difference, ambiguities and other language issues such
as Chinese segmentation could all make this task even more challenging.
Researchers who interest in cross-lingual link discovery are all welcome to join
us. Particularly, researchers from either CLIR or link discovery community are
TASK DEFINITION:
incoming, but in this task we mainly focus on the outgoing link starting from
English source documents and being pointed to Chinese, Korean, and Japanese
subtasks:
Participants can choose one or more of the above three subtasks to participate
in.
The English topics and the target corpus consist of actual Wikipedia pages in xml
format with rich structured information. To submit a run, participants are required
to choose the most suitable anchors from the topic document, and for each
anchor identify the most suitable documents in the test corpus. For each topic we
Two sets of 25 articles chosen from the English Wikipedia will be used as topics
for the uses of creating dry run and formal run separately. These topics will be
orphaned by removing all links to then (from the collection) and from them (to the
collection). The corresponding pages in Chinese, Japanese and Korean will also
The training and test collections for the three subtasks are exactly the same. The
collections are formed by search engine friendly xml files created from Wikipedia
mysql database dumps taken on June 2010. The details of the collections are
given as following (the language of the corpus, the number of articles, the size of
done by human assessors. For the latter, all submissions will be pooled and a
GUI tool for efficient assessment will be used. In manual assessment, either the
anchor candidate or the target link could be identified relevant (or non-relevant).
5
associated links inside this anchor will become non-relevant. After the
Please visit
http://ntcir.nii.ac.jp/CrossLink
Please also note that the registration deadline is December 20, 2010 (for all
NTCIR-9 tasks).
ORGANIZERS: