You are on page 1of 4

V

viewpoints

DOI:10.1145/1498765.1498780 Bertrand Meyer, Christine Choppy, Jørgen Staunstrup, and Jan van Leeuwen

Viewpoint
Research Evaluation
for Computer Science
Reassessing the assessment criteria and techniques traditionally used
in evaluating computer science research effectiveness.

A
CADEMIC CULTURE IS chang- Informatics Europe, the associa- peer-reviewed; rejection is frequent,
ing. The rest of the world, tion of European CS departments,b even for senior scientists. Once
including university man- has undertaken a study of the issue, published, a researcher’s work will
agement, increasingly as- of which this Viewpoint column is a be regularly assessed against that
sesses scientists; we must preliminary result. Its views commit of others. Researchers themselves
demonstrate worth through indicators, the authors only. For ease of use the referee papers for publication, par-
often numeric. While the extent of the conclusions are summarized through ticipate in promotion committees,
syndrome varies with countries and in- 10 concrete recommendations. evaluate proposals for funding agen-
stitutions, La Fontaine’s words apply: cies, answer institutions’ requests
“not everyone will die, but everyone is hit.” for evaluation letters. The research
Tempting as it may be to reject numeri- management edifice relies on assess-
cal evaluation, it will not go away. The ment of researchers by researchers.
problem for computer scientists is that Criteria must be fair (to the extent
assessment relies on often inappropri- possible for an activity circumscribed
ate and occasionally outlandish crite- by the frailty of human judgment);
ria. We should at least try to base it on openly specified; accepted by the tar-
metrics acceptable to the profession. get scientific community. While other
In discussions with computer sci- disciplines often participate in evalua-
entists from around the world, this tions, it is not acceptable to impose cri-
risk of deciding careers through dis- Our focus is evaluation of individu- teria from one discipline on another.
torted instruments comes out as a als rather than departments or labo-
top concern. In the U.S. it is mitigat- ratories. The process often involves Computer Science
ed by the influence of the Computing many criteria, whose importance var- Computer science concerns itself with
Research Association’s 1999 “best ies with institutions: grants, number the representation and processing of
practices” report.a In many other of Ph.D.s and where they went, com- information using algorithmic tech-
countries, computer scientists must munity recognition such as keynotes niques. (In Europe the more common
repeatedly explain the specificity of at prestigious conferences, best pa- term is Informatics, covering a slightly
their discipline to colleagues from per and other awards, editorial board broader scope.) CS research includes
other areas, for example in hiring and memberships. We mostly consider a two main flavors, not mutually exclu-
promotion committees. Even in the particular criterion that always plays sive: Theory, developing models of
U.S., the CRA report, which predates an important role: publications. computations, programs, languages;
widespread use of citation databases Systems, building software artifacts and
and indexes, is no longer sufficient. Research Evaluation assessing their properties. In addition,
Research is a competitive endeavor. domain-specific research addresses
Researchers are accustomed to con- specifics of information and comput-
a For this and other references, and the source stant assessment: any work submit- ing for particular application areas.
of the data behind the results, see an expand-
ed version of this column at http://se.ethz.
ted—even, sometimes, invited—is CS research often combines aspects
ch/~meyer/publications/cacm/research_eval- of engineering and natural sciences as
uation.pdf. b See http://www.informatics-europe.org. well as mathematics. This diversity is

A PR IL 2 00 9 | VO L. 52 | N O. 4 | CO M M UN I CAT IO NS OF THE ACM 31


viewpoints

part of the discipline’s attraction, but scientists will cite Knuth’s The Art of
also complicates evaluation. Computer Programming. Seminal con-
Across these variants, CS research cepts such as Design Patterns first be-
exhibits distinctive characteristics, came known through books.
captured by seminal concepts: algo- 2. A distinctive feature of CS pub-
rithm, computability, complexity, lication is the importance of selective
 specification/implementation duali- conferences and books. Journals do not
ty, recursion, fixpoint, scale, function/ necessarily carry more prestige.
 
  data duality, static/dynamic duality,
modeling, interaction…Not all scien-
Publications are not the only sci-
entific contributions. Sometimes

 tists from other disciplines realize the
existence of this corpus. Computer
the best way to demonstrate value is
through software or other artifacts.

    scientists are responsible for enforc-
ing its role as basis for evaluation:
The Google success story involves a
fixpoint algorithm: Page Rank, which

  1. Computer science is an original


discipline combining science and engi-
determines the popularity of a Web
page from the number of links to it.
neering. Researcher evaluation must be Before Google was commercial it was
adapted to its specificity. research, whose outcome included a
paper on Page Rank and the Google
The CS Publication Culture site. The site had—beyond its future
In the computer science publication commercial value—a research value
culture, prestigious conferences are that the paper could not convey: dem-
a favorite tool for presenting original onstrating scalability. Had the authors
research—unlike disciplines where the continued as researchers and come up
prestige goes to journals and conferenc- for evaluation, the software would have
es are for raw initial results. Acceptance been as significant as the paper.
rates at selective CS conferences hover Assessing such contributions is del-
between 10% and 20%; in 2007–2008: icate: a million downloads do not prove
! ICSE (software engineering): 13% scientific value. Publication, with its
! OOPSLA (object technology): 19% peer review, provides more easily de-
! POPL (programming languages): 18% codable evaluation grids. In assessing
Journals have their role, often to CS and especially Systems research,
publish deeper versions of papers however, publications do not suffice:
already presented at conferences. 3. To assess impact, artifacts such as soft-
     While many researchers use this op- ware can be as important as publications.
portunity, others have a successful Another issue is assessing individu-
 
 
    
career based largely on conference al contributions to multi-author work.
       

papers. It is important not to use Disciplines have different practices
   
   journals as the only yardsticks for (2007–2008):
 
   
   computer scientists. ! Nature over a year: maximum co-
Books, which some disciplines do authors per article 22, average 7.3
  
   
not consider important scientific con- ! American Mathematical Monthly: 6, 2
       
 tributions, can be a primary vehicle in ! OOSPLA and POPL: 7, 2.7


    
   CS. Asked to name the most influen- Disciplines where many coauthors
 
     
 tial publication ever, many computer are the norm use elaborate name-or-
       
 dering conventions to reflect individual
contributions. This is not the standard
  
   
    
The research culture in CS (except for such common
practices as listing a Ph.D. student first
management edifice in a joint paper with the advisor.


relies on assessment 4. The order in which a CS publica-
tion lists authors is generally not signifi-
   of researchers by cant. In the absence of specific indica-
researchers. tions, it should not serve as a factor in
researcher evaluation.

Bibliometry
In assessment discussions, numbers
typically beat no numbers; hence the
temptation to reduce evaluations to

32 C OMMU NICATIO NS O F TH E AC M | AP R I L 2 009 | VOL . 5 2 | NO. 4


viewpoints

such factors as publication counts,


measuring output, and citation counts,
An issue of concern to
Calendar
measuring impact (and derived mea-
sures such as indexes, discussed next).
While numeric criteria trigger
computer scientists of Events
strong reactions,c alternatives have is the tendency to April 13–16
problems too: peer review is strongly use publication Computation and Control,
San Francisco, CA,
dependent on evaluators’ choice and
availability (the most competent are databases that do not Sponsored: SIGBED,
Contact: Paulo Tabuada,
often the busiest), can be biased, and adequately cover CS. Email: tabauda@ee.ucla.edu
does not scale up. The solution is in April 15–18
combining techniques, subject to hu- The 8th International
man interpretation: Conference on Information
5. Numerical measurements such as Processing in Sensor Networks,
San Francisco, CA,
publication-related counts must never Sponsored: SIGBED,
be used as the sole evaluation instru- Contact: Rajesh Gupta,
ment. They must be filtered through cles cite a famous protocol paper—to Email: rgupta@ucsd.edu
human interpretation, particularly to show that their tools catch an equally
April 20–24
avoid errors, and complemented by peer famous error in the protocol. Design, Automation and Test in
review and assessment of outputs other ! Time. Citation counts favor older Europe,
than publications. contributions. Nice, France,
! Size. Citation counts are absolute; im-
Sponsored: SIGDA,
Measures should not address vol- Contact: Benini Luca,
ume but impact. Publication counts pact is relative to each community’s size. Email: lbenini@deis.unibo.it
only assess activity. Giving them any ! Networking. Authors form mutual
other value encourages “write-only” citation societies. April 20–24
! Bias. Some authors hope (unethi-
The 18th International World
journals, speakers-only conferences, Wide Web Conference,
and Stakhanovist research profiles fa- cally) to maximize chances of accep- Madrid, Spain,
voring quantity over quality. tance by citing program committee Contact: Gonzalo Leon,
6. Publication counts are not ade- members. Email: gonzalo.leon@pucp.
edu.pe
quate indicators of research value. They The last two examples illustrate
measure productivity, but neither im- the occasionally perverse effects of April 26–30
pact nor quality. assessment techniques on research International Conference on the
Citation counts assess impact. They work itself. Foundations of Digital Games,
Port Canaveral, FL,
rely on databases such as ISI, CiteSeer, The most serious problem is data Contact: Emmet James
ACM Digital Library, Google Scholar. quality; no process can be better than its Whitehead, Jr.,
They, too, have limitations: data. Transparency is essential, as well as Email: ejw@cs.ucsc.edu
! Focus. Publication quality is just error-reporting mechanisms and prompt
May 1–2
one aspect of research quality, impact response (as with ACM and DBLP): Western Canadian Conference
one aspect of publication quality, cita- 7. Any evaluation criterion, especial- on Computing Education,
tions one aspect of impact. ly quantitative, must be based on clear, Burnaby, BC Canada,
! Identity. Misspellings and man- Contact: Diana Cukierman,
published criteria.
Email: Diana@cs.sfu.ca
gling of authors’ names lose citations. This remains wishful thinking for
Names with special characters are par- major databases. The methods by May 5–8
ticularly at risk. If your name is Kröten- which Google Scholar and ISI select 14th International Conference
on Animation, Effects, Games
fänger, do not expect your publications documents and citations are not pub-
and Digital Media,
to be counted correctly. lished or subject to debate. Stuttgard, Germany,
! Distortions. Article introductions Publication patterns vary across Contact: Thomas Haegele,
heavily cite surveys. The milestone disciplines, reinforcing the comment Email: thomas.haegele@
filmakademie.de
article that introduced NP-complete- that we should not judge one by the
ness has far fewer citations than a rules of another: May 10–13
later tutorial. 8. Numerical indicators must not ACM 2009 International
! Misinterpretation. Citation may serve for comparisons across disciplines. Conference on Supporting
Group Work,
imply criticism rather than apprecia- This rule also applies to the issue Sanibel Island, FL,
tion. Many program verification arti- (not otherwise addressed here) of eval- Sponsored: SIGCHI,
uating laboratories or departments Contact: Erling Carl Havn,
rather than individuals. Email: havn@man.dtu.dk
c D. Parnas, “Stop the Numbers Game,” Com-
mun. ACM 50, 11 (Nov. 2007), 19–21; available
at http://tinyurl.com/2z652a. Parnas mostly
CS Coverage in Major Databases
discusses counting publications, but deals An issue of concern to computer sci-
briefly with citation counts. entists is the tendency to use publica-

A P R IL 2 00 9 | VO L. 52 | N O. 4 | C OM M U N IC AT I ON S O F THE ACM 33
viewpoints

tion databases that do not adequately putes these indexes from Google Schol-
cover CS, such as Thomson Scientific’s ar data. Such indexes cannot be more
ISI Web of Science. Our focus is credible than the underlying databas-
The principal problem is what ISI evaluation of es; results should always be checked
counts. Many CS conferences and most manually for context and possible dis-
books are not listed; conversely, some individuals rather tortions. It would be as counterproduc-
publications are included indiscrimi- than departments tive to reject these techniques as to use
nately. The results make computer them blindly to get definitive research-
scientists cringe.d Niklaus Wirth, Tur- or laboratories. er assessments. There is no substitute
ing Award winner, appears for minor for a careful process involving comple-
papers from indexed publications, mentary sources such as peer review.
not his seminal 1970 Pascal report.
Knuth’s milestone book series, with an Assessing Assessment
astounding 15,000 citations in Google Scientists are taught rigor: submit
Scholar, does not figure. Neither do common with the ISI list. any hypothesis to scrutiny, any experi-
Knuth’s three articles most frequently ISI’s “highly cited researchers” list ment to duplication, any theorem to
cited according to Google. includes many prestigious computer independent proof. They naturally
Evidence of ISI’s shortcomings for scientists but leaves out such iconic assume that processes affecting their
CS is “internal coverage”: the percent- names as Wirth, Parnas, Knuth and all careers will be subjected to similar
age of citations of a publication in the the 10 2000–2006 Turing Award winners standards. Just as they do not expect,
same database. ISI’s internal cover- except one. Since ISI’s process provides in arguing with a Ph.D. student, to im-
age, over 80% for physics or chemistry, no clear role for community assessment, pose a scientifically flawed view on the
is only 38% for CS. the situation is unlikely to improve. sole basis of seniority, so will they not
Another example is Springer’s Lec- The inevitable deficiencies of alter- let management impose a flawed eval-
ture Notes in Computer Science, which natives pale in consideration: uation mechanism on the sole basis of
ISI classified until 2006 as a journal. 9. In assessing publications and cita- authority:
A great resource, LNCS provides fast tions, ISI Web of Science is inadequate for 10. Assessment criteria must them-
publication of conference proceed- most of CS and must not be used. Alterna- selves undergo assessment and revision.
ings and reports. Lumping all into a tives include Google Scholar, CiteSeer, Openness and self-improvement
single “journal” category was absurd, and (potentially) ACM’s Digital Library. are the price to pay to ensure a success-
especially since ISI omits top non- Anyone in charge of assessment ful process, endorsed by the commu-
LNCS conferences: should know that attempts to use ISI nity. This observation is representative
! The International Conference on for CS will cause massive opposition of our more general conclusion. Nega-
Software Engineering (ICSE), the top and may lead to outright rejection tive reactions to new assessment tech-
conference in a field that has its own of any numerical criteria, including niques deserve consideration. They
ISI category, is not indexed. more reasonable ones. are not rejections of assessment per
! An LNCS-published workshop at se but calls for a professional, rational
ICSE, where authors would typically Assessment Formulae approach. The bad news is that there is
try out ideas not yet ready for ICSE A recent trend is to rely on numeri- no easy formula; no tool will deliver a
submission, was indexed. cal measures of impact, derived from magic number defining the measure of
ISI indexes SIGPLAN Notices, an citation databases, especially the h- a researcher. The good news is that we
unrefereed publication devoting or- index, the highest n such that C (n) ≥ have ever more instruments at our dis-
dinary issues to notes and letters and n, where C (n) is the citation count of posal, which taken together can help
special issues to proceedings of such the author’s n-th ranked publication. form a truthful picture of CS research
conferences as POPL. POPL papers ap- Variants exist: effectiveness. Their use should under-
pear in ISI—on the same footing as a ! The individual h-index divides the go the same scrutiny that we apply to
reader’s note in a regular issue. h-index by the number of authors, bet- our work as scientists.
The database has little understand- ter reflecting individual contributions.
ing of CS. Its 50 most cited CS refer- ! The g-index, highest n such that the Bertrand Meyer (Bertrand.Meyer@inf.ethz.ch) is
ences include “Chemometrics in food top n publications received (together) a professor of software engineering at ETH Zurich,
the Swiss Federal Institute of Technology, and chief
science,” from a “Chemometrics and at least n2 citations, corrects another architect of Eiffel Software, Santa Barbara, CA.
Intelligent Laboratory Systems” jour- h-index deficiency: not recognizing
Christine Choppy (Christine.Choppy@lipn.univ-paris13.
nal. Many CS entries are not recogniz- extremely influential publications. (If fr) is a professor of computer science at Université Paris
able as milestone contributions. The your second most cited work has 100 XIII and member of LIPN (Laboratoire d’Informatique
de Paris Nord), France.
cruelest comparison is with CiteSeer, citations, the h-index does not care
Jørgen Staunstrup (jst@itu.dk) is provost of the IT
whose Most Cited list includes many whether the first has 101 or 15,000.) University in Copenhagen, Denmark.
publications familiar to all computer The “Publish or Perish” sitee com-
Jan van Leeuwen (jan@cs.uu.nl) is a professor of
scientists; it has not a single entry in computer science at Utrecht University, The Netherlands.
e See http://www.harzing.com/resources.htm#/
d All ISI searches as of mid-2008. pop.htm. Copyright held by author.

34 COMM UNI CAT IO NS OF TH E ACM | A P R I L 200 9 | VO L. 52 | NO. 4

You might also like