You are on page 1of 15

Hunter X Scholar – Finger out Famous Men in Your Research Area

Chi-Jen Wu
Electrical Engineering
National Taiwan University
cjwu@arbor.ee.ntu.edu.tw

Abstract
As the growth of the WWW, scientists and researchers publishing their research
information on the web may become an essential comportment in academia, an
enormous number of web pages provide information on scientists, research papers,
and technical documents in the Internet and indexed by search engines. For a junior
student or junior researcher, it is a nontrivial task to know/search authoritative
scholars (experts) in his research area. Since an interesting challenge arises, how to
finger out the important scholars on a research topic? However, an excellent scholar
searching system involves analysis of reputation, publication, citation, and activities
among a large number of scholars.
In this project, we present the design and implementation of a scholars searching
system prototype based on a web mining approach. Our system computes the ranking
of scholars that are relevant to a give research area, e.g. Data mining area, and shows
out top-k scholars. We also designed %-index, a new ranking function for positioning
scholar in a specific research field ranking. The ranking criteria of scholars are based
on publications, citations by computing the query results from Google, Google
Scholar. Based on the experimental results, our approach outperforms other existing
approaches in a specific research field.

1 Introduction
As the growth of the WWW, academicians publishing their research information
on the web may become an essential comportment in academia, an enormous number
of web pages provide information on scientists, research papers, and technical
documents in the Internet and indexed by search engines. We can expect that the
contents of the Web will become vaster and vaster as time goes by. However, the
variety of research area also becomes further diversity during the past decade, more
and more new research areas had been motivated, such as Data Mining, P2P systems,

1
Network Coding, and so on. For a junior student or junior researcher starting on
his/her research work, it is a nontrivial task to know/search authoritative scholars in
his research area by a general purpose search engine. For example, we use the popular
web search engine, Google, to search the research field “Data Mining”, Google
retrieves and lists more than 20,800,000 web pages that are relevant to “data mining”,
and even we use the Google Scholar [1] that is a special purpose search engine for
scholar literature has been developed by Google, it also returns more than 1,400,000
papers in data mining research area. For most of people, these data is too huge to find
out the important scholar and his significant papers among the search results.
Since an interesting challenge arises, how to finger out the important/famous
scholars on a research topic? In fact, constructing rankings of scholar authorities is a
relatively new subfield of information retrieval research. This problem is different to
the traditional expert finding problem [2-5], in essence, the goal of expert finding
system is to identify a list of people who are with appropriate skills and knowledge
about a given topic. However, the scholar searching problem is a deeper expert
finding problem, it is not only identifying right scholars who possess a required
knowledge in a research community, but also ranking their level of authority in the
research field. Let us consider a simple scenario, a junior researcher practically wishes
to find a list of scholars that made significant contributions and/or published a seminal
paper in his/her research field. Unfortunately, the traditional expert finding system
using standard IR techniques may return a Ph.D. student because he/she may have
particular levels of expertise in the research area, but this Ph.D. student may be not a
famous scholar in the research field [10,19]. In general, scholar searching is a more
complex and difficult task than expert finding, especially, there are no standards
specifying the criteria or popular qualifications necessary for particular levels of
authority of scholar.
In this project, we present the design and implementation of a scholar searching
system prototype based on a web mining approach, however, we first focus on the
problems of scholar finding and scholar ranking in this work. Our system assorts the
ranking of scholars that are relevant to a give research area, e.g. Data mining area, and
shows out top-k scholars. For scholar finding, we utilize the search engine and digital
libraries to find a lot of documents about a certain topic, and extract the authors from
the collected documents. Then we estimate the extracted author’s relevance to the
given topic on web pages through statistical analysis. We assume that authors with a
plenty of articles about a certain topic are more likely to be an expert on that topic and
authors with highly cited papers are indicative of the authorities. For scholar ranking,
we design %-index, a ranking function for positioning scholars. The ranking criteria
of scholars are based on publications, citations by computing the query results from

2
the scientific literature digital archive, such as Google Scholar, MS Libra Academic
Search [17], or CiteSeerX [18], and the ranking function, called %-index, is a novel
way of estimating an individual scholar’s impact in a single research field. The
%-index is indicated that the total citation of a scholar’s papers is m% percentage of
total citation of whole papers in this research field. In summary, we wish our system
could make junior students more convenient in studying way. Our system URL:
http://140.109.22.36/cjwu/dm_scholar/mycgi.htm
Our contributions in this work include: 1) proposal of a web mining approach to
famous/authoritative scholar searching, 2) we developed a flexible ranking function
called %-index that facilitates scholar ranking in a smaller research field, and 3) we
developed and demonstrated our scholar searching system in a realistic web service. A
main advantage of our approach is that users can query any research topic and find a
list of authoritative scholars without dedicated databases for the demand. In addition,
another interesting application of our scholar searching system is automatically
routing submitted papers to reviewers in conferences [6,7]. The assignment of
submitted manuscripts to reviewers is a common task of journals editors or
conference chairs, our system can help committees to find the right person for paper
review under severe time pressure.
In the following section we describe the relevant related work in expert finding
system and previous efforts on ranking of scholars and research institutions. Section 3
describes our system design and methodology. Then, we show the preliminary
experimental results of the system and demonstrate several examples in Section 4.
Finally, we conclude our paper in Section 5.

2 Related Work
Especially, we defined an excellent scholar searching system is composed of
expert finding, expert ranking and expert profiling. First, expert finding [2,8-11] is the
task of finding right scholars about a specific topic with high probability. Within a
research community, such as computer science, there should be many possible
candidates who could be relevant to a given topic, the expert finding operation could
retrieve a list of candidates that are deemed the most likely scholars for this topic.
Second, expert ranking [15,16] assorts the levels of authority among the candidates,
and it involves analysis of reputation, publication, citation, and activities among a list
of candidate scholars. Finally, expert profiling [12,13] to dig and extract the profile
information of a individual scholar from the Web, it includes basic information,
contact information, and the educational history. In this section, we descript the
related work includes the above three components.

3
2.1 Expert Finding
Traditional expert finding is identifying a list of people who are with appropriate
skills and knowledge about a given topic [5]. Most of previous efforts rely on the
development of an expert database through manual [21], or base on the text, citation
or document analysis in matching user’s research topic [3,8,11,14].
Since our research is based on Web Mining and search engines, we discuss that
the relevant work in here. In 1997, Kautz et al. [2] developed a first expert extraction
system called Referred Web based on Web mining and search engine techniques.
Referred Web automatically generates a representations of social network based on
evidence gathered from the Web pages. Based on the links of social network, Referred
Web allows users to search people who are likely to be experts in a given topic in a
workgroup. Recently, Harada et al. [14] proposed NEXAS system, an extension to
web search engines that attempts to find real world entities reflected on the Web, and
use it to search people relevant to a topic. And Zhang et al. [9] proposed a mixture
model for expert finding, main idea is to utilize the Probabilistic Latent Semantic
Analysis (PLSA) [21] to capture the semantic relevance between the query and the
experts. The authors also developed Arnetminer system [13] that addresses several
key issues in scholar searching, such as scholar finding, scholar profiling. In 2007,
Microsoft Asia Research Group also developed a similar system, called MS Libra
Academic Search [18], which is a free computer science bibliography search engine,
and its principal idea is based on the object-level vertical search technique proposed
by Nie et al. [22].

2.2 Expert Ranking


We are aware that a few systems employed expert ranking techniques, such as
Arnetminer, Libra, and CiteSeerX. The main idea of Arnetminer is similar to Referred
Web, and other systems are based on the Information Retrieval schemes. Because no
standards specifying the criteria for particular levels of authority of scholar, how to
rank scholar is a difficult task and hard to result in a unanimous solution. The
well-known ranking index is impact factor that is defined as the average number of
citations per journal over a two years period. In 2005, the h-index [16] has been
proposed to measure an individual scholar’s impact. The h-index is indicated that a
scholar has published h papers and these papers received more than h citations. In the
state of the art, Ren and Tayloar [15] provided an automatic publication-ranking based
framework to support such ranking for scholars and research institutions. They
discussed the most important ranking policy and pointed out some problems for
publication-ranking. In our scholar searching system, we focus on the user-define
research area. However, the above criteria are better at discriminating between

4
scholars within a whole research field than within a single research area. And it is also
hard to gather the impact factors for every papers and authors. Lacking these materials,
we wish to use %-index to work better in smaller research areas and yield more valid
evaluation results for more prominent scholars.

2.3 Expert Profiling


Another important challenge of scholar searching system is expert profiling task.
Specifically, it focuses on studying how to extract the profile of an individual
researcher from the Web automatically. Several research efforts have been made for
expert profiling task [12,13,23]. Recently, Tang and coworkers [12,13] present a
unified approach to extract the scholar profile on an academic social network. This
system also addresses the name disambiguation problem [24] in integration. Actually,
many profile extraction methods have been proposed, an overview can be found in
here [25].

3 Our Approach
In this section, we describe our scholar searching system and its components, and
demonstrate the system by several simple experiments for scholar search. First, we
give an overview of the system’s main concepts, the corresponding task components,
and their interplay. Then we had built the system prototype based on these concepts
and started experimenting, we demonstrated a number of search experiments, and
compared our approach with other systems, such as Arnetminer and Libra.

3.1 System Overview


Our scholar searching system consists of three main components that are
depicted in Figure 1. In the first step of our approach, we design a specific crawler to
crawl the scientific literature digital archive to gather the candidates set, called c set.
This crawler collects a lot of scientific literatures related to a given the query topic q
and extracts the name of authors in the crawled articles. In addition, the crawler also
analyses the citations of each literature and candidate, respectively. After obtaining
the c set, the second step of our system is to estimate the associations between topic
and candidates. For estimating the relevance to the given topic, we have the following
claim.

Claim 1: Authors with a plenty of articles about a certain topic are more likely to
be an expert on the topic q.

This claim should be reasonable because a more important scholar, his/her name

5
Figure 1. A system overview of our approach and its main components

should be more popular on the Web pages [14], e.g. a more important scholar’s name
should be enthusiastically recorded on a amount of web pages, such as conference
program, seminar web pages, journal papers. Since our idea is based in this claim, we
estimate the extracted author’s relevance to the given topic on web pages through
statistical analysis. A number of statistical analysis methods have been proposed for
estimating term association based on co-occurrence measures [26]. In our study,
Chi-square test is adopted because the required parameters for it could be easily
gathered by a search engine. After this step, we can rank candidates according to the
results of Chi-square test, and determine top-k candidates in the c set. Finally, we use
the ranking function, called %-index, which is a novel way of estimating an individual
scholar’s impact in a single research field. We will define the %-index in next section.
In the following section, we describe the details of our two components includes
scholar extraction and scholar ranking, respectively.

3.2 Relevance Estimating


In our system, we use the Chi-square test to estimate the strength of relation
between extracted scholar and given research topic by co-occurrence of their on web
pages. The Chi-square test is easy to implement in our system. We list the required
parameters for Chi-square test in the Table 1. Given a query topic q and a candidate’s

6
name c, and we assume q and c are independent. Same with [27], we can get the
following equations,
E(q,c) = ((a+c)(a+b))/n,
E(q, ¬ c) = ((b+d)(a+b))/n,
E( ¬ q,c) = ((a+c)(c+d))/n, and
E( ¬ q, ¬ c) = ((b+d)(c+d))/n,

Then, we can get a conventional chi-square test as follow:

[ n ( X , Y ) − E ( X , Y )]2
χ 2 (q, c) = ∑
∀X ∈{ q , ¬ q }, E ( X ,Y )
∀Y ∈{ c , ¬ c }

n × (a × d − b × c) 2
=
( a + b ) × ( a + c ) × (b + d ) × ( c + d )
This Chi-square test plays a crucial role in our system, and it could be a
co-occurrence index approximately. In the implementation, we use Google as a search
engine, but other major search engines are also applicable platforms. The Chi-square
test method provides a simple way to estimate the relevance between candidate and
research topic, and it is easy to implement, but its performance strongly is dominated
by the size and amounts of retrieved web pages.

The required parameter Notation


The total number of Web pages n
The number of Web pages containing both candidate’s a
name and topic
The number of Web pages containing topic but without b
candidate’s name
The number of Web pages containing candidate’s c
name but without topic
The number of Web pages without both candidate’s d (d=n-a-b-c)
name and topic

Table 1. The required parameters for Chi-square test (we set n=8 billions in our
experiments)

3.3 Scholar Ranking


Existing ranking indexes have several limitations. One major limitation is that
they are used to rank the whole research field, such as all of computer science. It is
hard to infer the contributions of a scholar in a sub research field. For example, the

7
CiteSeerx provides a ranking of whole computer science scholar by counting citations,
but it is not easy to find a significant researcher in the “Network Coding” research
area. Hence we design a novel ranking function, called %-index, to estimate an
individual scholar’s impact in a single research field, such as network coding. We
define the %-index as follow.
The %-index is indicated that the total citation of a scholar’s papers is m%
percentage of total citation of whole papers in this research field. We define %-index
as follow,

Ci%−index =
∑ citaions ∈ C × 100 ,
i

∑ citations ∈ θ
where Ci is dedicated a scholar in the c set, and the θ is indicated a set of collected
scientific literatures.
However, assessing scholars is a complex social and scientific process. Our
%-index could be used alone, but it should probably serve as one quantitative
indicator in a more comprehensive methodology. In addition to publications, many
important factors, such as research impact, funding, students, can reflect the
importance of a scholar and we could take into consideration in advance works.

3.4 System Implementation and Demo


Our scholar searching system is implemented in CGI dynamic web page using
the Perl language. Figure 2 is the portal page of our system. Users enter a research
topic to retrieve top-k scholars in the given topic (here we set k =20 generally). Figure
3 shows the output results of our system. However, our system has a drawback, it
needs more than five minutes to process a request, due to the Google search engine
does not accept a lot of flash requests. In additional, we also show the results of
querying data mining and association rules in Table 2 and Table 3, individually. Noted
that association rules is a subset of data mining field. In following section, we will
analyze these results.

Figure 2. Portal of our system Figure 3. The ranking results

8
Ranking Candidate %-index χ 2 test
1 J Han 9.647531009 2652383

2 M Kamber 4.334521751 3927792

3 E Frank 3.833536147 2044977

4 IH Witten 3.833536147 1977850

5 G Piatetsky-Shapiro 3.728926937 28694484

6 P Smyth 3.595141009 4588678

7 T Hastie 3.498359699 2951988

8 R Tibshirani 3.498359699 2219211

9 J Friedman 3.498359699 186502

10 JC Bezdek 3.280601752 424565

11 UM Fayyad 3.218690179 25534724

12 R Agrawal 2.558300065 4679901

13 JA Hartigan 2.019598215 180509

14 J Shawe-Taylor 1.906449478 1827019

15 N Cristianini 1.906449478 1503047

16 J Pei 1.84311465 1646410

17 PS Yu 1.813226305 2855124

18 U Fayyad 1.641012503 6263246

19 Y Yin 1.609700903 248230

20 Ming-Syan Chen 1.60258463 3107667

Table 2. The ranking result of data mining research area

Ranking Candidate %-index χ 2 test


1 R Agrawal 31.18867812 63317224.61

2 R Srikant 20.87173693 255685481.9

3 A Swami 9.587025034 234630270.1

4 T Imielinski 8.729262056 350995142.7

5 J Han 8.076648629 33356763.11

6 G Piatetsky-Shapiro 6.964678599 107474023.8

7 H Mannila 6.30909199 67368432.23

8 P Smyth 5.750133793 13005709.05

9 H Toivonen 5.503359696 41276775.5

10 UM Fayyad 5.249152643 26075405.81

11 PS Yu 3.891895106 18344775.38

12 Ming-Syan Chen 3.432538503 16059136.61

13 R Motwani 3.041565083 8234266.614

9
14 S Brin 2.84979485 7073997.568

15 MJ Zaki 2.347327109 53639929.75

16 A Savasere 2.229886424 35516464.2

17 DW Cheung 1.948920735 5068106.193

18 B Liu 1.934054825 3484209.628

19 N Pasquier 1.87905096 17276391.84

20 R Taouil 1.855265505 18117526.04

Table 3. The ranking result of association rules research area, a subset


of data mining

4 Experimental Results
To validate our system, we use it to perform two rankings. The first ranking
assessed the scholars in data mining area, and second ranking evaluated association
rules field. We compared both two ranking results with Arnetminer and MS Libra
system by analyzing the co-occurrence information of a scholar’s name and the query
topic on the Web. First we give the perspective statistics of these two fields. Noted
due to the limitations of Google Scholar, our crawler only retrieved first 1,000 papers
in Google Scholar search results.
Figure 4 depicts the citation distribution of collected papers. In this figure, we can
know that the citation impact is on the decline after the 100th paper. And there is a
very high citation paper in Data mining field. In fact, it is a book, called “Neural
networks: a comprehensive foundation” by Simon Haykin, received more than 11,407
citations. Although this author has a very high citation paper, his importance may not
be more than the top-k scholar’s in data mining field, his chi-square test is 107836 <<
180509 (JA Hartigan’s chi-square test). Figure 4 also shows the first 0.1% papers
dominated 95% citations in both fields. Figure 5 depicts the citation distribution of
collected scholars. It also shows a fewer people received a great citations, that is
similar to result of Figure 4. We assume the full name can identify a person; we do not
deal with the name disambiguation problem in here.

Figure 4. Citation distribution (paper) Figure 5. Citation distribution (person)

10
Next, we compared our approach with Arnetminer and MS Libra system by
analyzing the co-occurrence information of a scholar’s name and the query topic on
the Web. We show the rankings of our approach, Arnetminer and MS Libra in
following tables. The Table 4 shows the product by querying Data Mining area, and
the Table 5 shows the output of querying association rules. And the number in right
side of a name is the co-occurrence of a name and given research topic on the Web
(by Google, again). Following, we discussed the performance metric for evaluating
the three systems. Let Ri be the number of co-occurrence of a name and given
research topic in the tables. And Ti is indicated the set of scholars produced by each
mechanism individually.

median = median(T1 U T2 U T3 )
η= ∑ ( R ≥ median) ?1 : 0
∀R∈T
i

ηis our performance metric for evaluating the systems, the median() is used to
get the median number in the union set. Figure 6 and 7 show the comparison results of
two experiments. In first experiment, query data mining, the performances of three
systems are similar. Then in the second experiment, our approach is more
distinguishable than the two approaches. Noted mining association rules is the sub
field of whole data mining area. The reason is the impact of our %-index metric,
which is designed to ranking a specific research field.

Figure 6. Citation distribution

11
Our Approach (T1) Arnetminer (T2) MS Libra (T3)
J Han (46,600) Jiawei Han (46,600) Rakesh Agrawal (21,500)

M Kamber (13,900) Christos Faloutsos (17,300) Tomasz Imielinski (5,610 )

E Frank (20,600) Philip S. Yu (19,800) Jiawei Han (46,600)

IH Witten ( 23,900) Mohammed Javeed Zaki (2,510) Philip S. Yu (19,800)

G Piatetsky-Shapiro (12,000) Heikki Mannila (19,100) Christos Faloutsos (17,200)

P Smyth (15,800) Rakesh Agrawal (21,500) Ramakrishnan Srikant (14,300)

T Hastie (10,600) Jian Pei (11,700) Heikki Mannila (19,100)

R Tibshirani (8,470) Usama M. Fayyad (9,230) Ian H. Witten ( 23,900)

J Friedman (7,420) Eamonn J. Keogh (1,750) Padhraic Smyth (15,800)

JC Bezdek (2,620) Charu C. Aggarwal (8,060) Hans-peter Kriegel (9,760)

UM Fayyad (9,230) Johannes Gehrke (10,900) Gregory Piatetsky-shapiro (12,000)

R Agrawal (21,500) Wei Wang (57,900) Arun N. Swami (2,500 )

JA Hartigan (1,280) Srinivasan Parthasarathy (5,410) Ming-syan Chen (8,870)

J Shawe-Taylor (6,210) Haixun Wang (6,020) Mohammed Javeed Zaki (2,510)

N Cristianini (4,980) Jiong Yang (7,570) Hannu Toivonen (7,070)

J Pei (11,700) Salvatore J. Stolfo (5,660) Raymond T. Ng (7,120)

PS Yu (19,800) Bing Liu (18,100) Usama M. Fayyad (9,230)

U Fayyad (8,240) Gregory Piatetsky-Shapiro (12,000) Salvatore J. Stolfo (5,660)

Y Yin (4,460) Chris Clifton (5,130) Jim Gray (15,300)

Ming-Syan Chen (8,870) Ming-Syan Chen (8,870) Vipin Kumar (21,400)

Table 4. The comparisons of scholar ranking system, a result of data mining research
area (median = 9230)

Figure 7. Citation distribution

12
Our Approach (T1) Arnetminer (T2) MS Libra (T3)
R Agrawal (13,000) Jiawei Han (13,600) Rakesh Agrawal (13,000)

R Srikant (9,360) Philip S. Yu (6,860) Tomasz Imielinski (5,320)

A Swami (1,720) Rakesh Agrawal (13,000) Ramakrishnan Srikant (9,360)

T Imielinski (5,320) Ramakrishnan Srikant (9,360) Arun N. Swami (1,720)

J Han (13,600) David Wai-Lok Cheung (1,210) Heikki Mannila (6,700)

G Piatetsky-Shapiro (5,470) Ke Wang (15,100) Hannu Toivonen (19,000)

H Mannila (6,700) Bing Liu (4,150) Jiawei Han (13,600)

P Smyth (3,200) Mohammed Javeed Zaki (1,300) A. Inkeri Verkamo (2,880)

H Toivonen (19,000) Yasuhiko Morimoto (976) Philip S. Yu (6,860)

UM Fayyad (4,150) Takeshi Tokuyama (1,070) Ming-syan Chen (4,300)

PS Yu (6,860) Takeshi Fukuda (1,090) Yongjian Fu ( 2,770)

Ming-Syan Chen (4,300) Shinichi Morishita (1,610) Shamkant B. Navathe (1,360)

R Motwani (3,250) Charu C. Aggarwal (2,910) Jong Soo Park (1,920)

S Brin (3,020) Frans Coenen (897) Mohammed Javeed Zaki (1,300)

MJ Zaki (1,300) Paul H. Leng (27) Edward Omiecinski ( 1,770)

A Savasere (75) Yiming Ma (1,810) Rajeev Motwani (3,250)

DW Cheung (1,210) Ling Feng (1,660) Wei Li (13,500)

B Liu (4,150) Ming-Syan Chen (4,300) Vipin Kumar (4,450)

N Pasquier (3,290) Vassilios S. Verykios (67) Ashoka Savasere (75)

R Taouil (2,980) Wynne Hsu (2,530) Srinivasan Parthasarathy (2,280)

Table 5. The comparisons of scholar ranking system, a result of association rules


research area (median = 2880)

5 Conclusion
In this project, we present the design and implementation of a scholar searching
system prototype based on a web mining approach. Our system computes the ranking
of scholars that are relevant to a give research area, e.g. Data mining area, and shows
out top-k scholars. We also designed %-index, a new ranking function for positioning
scholar in a specific research field. Our contributions in this work include: 1) proposal
of a web mining approach to famous/authoritative scholar searching, 2) we developed
a flexible ranking function called %-index that facilitates scholar ranking in a smaller
research field, and 3) we developed and demonstrated our scholar searching system in
a realistic web service. A main advantage of our approach is that users can query any
research topic and find a list of authoritative scholars without dedicated databases for
the demand. Based on the experimental results, our approach outperforms Arnetminer
and MS Libra in a specific research field. We wish our system could make junior
students more convenient in studying way.

13
References
1. P. Jacso, Google Scholar: the Pros and the Cons, Online Information Review,
pages 208-214, 2005.
2. Henry Kautz, Bart Selman, and Mehul Shah. ReferralWeb: Combining Social
Networks and Collaborative Filtering, the Communications of the ACM, vol. 30
no. 3, March 1997.
3. Krisztian Balog, Toine Bogers, Leif Azzopardi, Maarten de Rijke, Antal van den
Bosch: Broad expertise retrieval in sparse data environments. SIGIR 2007:
551-558.
4. M. Maybury. Expert Finding Systems. Technical Report MTR. 06B000040,
MITRE Corporation, 2006.
5. Nick Craswell and Arjen P. de Vries, Overview of the TREC-2005 Enterprise
Track, In Proceedings of the 15th Text Retrieval Conference (TREC), 2006.
6. Dumais, S. T. and Nielsen, J. , Automating the assignment of submitted
manuscripts to reviewers." In N. Belkin, P. Ingwersen, and A. M. Pejtersen (Eds.),
SIGIR'92: Proceedings of the 15th Annual International ACM SIGIR Conference
on Research and Development in Information Retrieval. ACM Press,
pp.233-244 ,1992.
7. Stefano Ferilli, Nicola Di Mauro, Teresa Maria Altomare Basile, Floriana Esposito,
Marenglen Biba: Automatic Topics Identification for Reviewer Assignment.
IEA/AIE 2006: 721-730
8. Toine Bogers, Klaas Kox, and Antal van den Bosch, “Using citation analysis for
expert retrieval in workgroups,” Proceedings of the 8th Belgian-Dutch
Information Retrieval Workshop (DIR 2008), pp 21-28. Maastricht, April 2008.
9. Jing Zhang, Jie Tang, Liu Liu, and Juanzi Li. A Mixture Model for Expert Finding.
In Proceedings of 2008 Pacific-Asia Conference on Knowledge Discovery and
Data Mining (PAKDD’2008).
10. Jun Zhang , Mark S. Ackerman , Lada Adamic, Expertise networks in online
communities: structure and algorithms, Proceedings of the 16th international
conference on World Wide Web, May 08-12, 2007, Banff, Alberta, Canada.
11. Tim Reichling , Michael Veith , Volker Wulf, Expert Recommender: Designing for
a Network Organization, Computer Supported Cooperative Work, v.16 n.4-5,
p.431-465, October 2007.
12. Jie Tang, Duo Zhang, and Limin Yao. Social Network Extraction of Academic
Researchers. In Proceedings of 2007 IEEE International Conference on Data
Mining (ICDM’2007).
13. Jie Tang, Jing Zhang, Limin Yao, Juanzi Li, Li Zhang, and Zhong Su. Extraction
and Mining of Academic Social Network. In Proceedings of the Fourteenth ACM

14
SIGKDD International Conference on Knowledge Discovery and Data Mining
(SIGKDD’2008).
14. Masanori Harada, Shin-ya Sato, Kazuhiro Kazama : Finding Authoritative People
from the Web, Joint Conference on Digital Libraries (JCDL 2004), June, 2004.
15. Jie Ren, Richard Taylor, Automatic and Versatile Publications Ranking for
Research Institutions and Scholars, the Communications of the ACM, June 2007.
16. Hirsch, J. E., An index to quantify an individual's scientific research output,
Proceedings of the National Academy of Science, vol. 102, Issue 46,
p.16569-16572, 2005.
17. MS Libra Academic Search, http://libra.msra.cn/
18. CiteSeerX, http://citeseerx.ist.psu.edu/
19. Littlepage, G. E., & Mueller, A. L. Recognition and utilization of expertise in
problem-solving groups: Expert characteristics and behavior. Group Dynamics:
Theory, Research, and Practice, 1, 324-328 (1997).
20. D. Yimam-Seid and A. Kobsa. Expert finding systems for organizations: Problem
and domain analysis and the demoir approach. Journal of Organizational
Computing and Electronic Commerce, 13(1): 1--24, 2003.
21. Thomas Hofmann, Probabilistic Latent Semantic Analysis, Proceedings of the
Fifteenth Conference on Uncertainty in Artificial Intelligence (UAI'99), 1999.
22. Zaiqing Nie, Ji-Rong Wen, Wei-Ying Ma, Object-Level Vertical Search, in
Proceedings of the Third Biennial Conference on Innovative Data Systems
Research (CIDR) 2007.
23. K. Balog and M. de Rijke, Finding Experts and their Details in E-mail Corpora, in
15th International World Wide Web Conference (WWW 2006), May 2006.
24. R. Bekkerman and A. McCallum, Disambiguation Web Appearances of People in
a Social Network, In Proc. of the 14th International World Wide Web Conference,
pp. 463-470, 2005.
25. Jie Tang, Mingcai Hong, Duo Zhang, Bangyong Liang, and Juanzi Li. Information
Extraction: Methodologies and Applications. In the book of Emerging
Technologies of Text Mining: Techniques and Applications, Hercules A. Prado
and Edilson Ferneda (Ed.), Idea Group Inc., Hershey, USA, 2007. pp. 1-33
26. R. Rapp. Automatic Identification of Word Translations from Unrelated English
and German Corpora, In Proceedings of 37th Annual Meeting of the Association
for Computational Linguistic (ACL), pp. 519-526, 1999.
27. Pu-Jen Cheng, Jei-Wen Teng, Ruei-Cheng Chen, Jenq-Haur Wang, Wen-Hsiang
Lu, Lee-Feng Chien: Translating unknown queries with web corpora for
cross-language information retrieval. SIGIR 2004: 146-153.

15

You might also like