Professional Documents
Culture Documents
viewpoints
DOI:10.1145/1498765.1498780 Bertrand Meyer, Christine Choppy, Jørgen Staunstrup, and Jan van Leeuwen
Viewpoint
Research Evaluation
for Computer Science
Reassessing the assessment criteria and techniques traditionally used
in evaluating computer science research effectiveness.
A
CADEMIC CULTURE IS chang- Informatics Europe, the associa- peer-reviewed; rejection is frequent,
ing. The rest of the world, tion of European CS departments,b even for senior scientists. Once
including university man- has undertaken a study of the issue, published, a researcher’s work will
agement, increasingly as- of which this Viewpoint column is a be regularly assessed against that
sesses scientists; we must preliminary result. Its views commit of others. Researchers themselves
demonstrate worth through indicators, the authors only. For ease of use the referee papers for publication, par-
often numeric. While the extent of the conclusions are summarized through ticipate in promotion committees,
syndrome varies with countries and in- 10 concrete recommendations. evaluate proposals for funding agen-
stitutions, La Fontaine’s words apply: cies, answer institutions’ requests
“not everyone will die, but everyone is hit.” for evaluation letters. The research
Tempting as it may be to reject numeri- management edifice relies on assess-
cal evaluation, it will not go away. The ment of researchers by researchers.
problem for computer scientists is that Criteria must be fair (to the extent
assessment relies on often inappropri- possible for an activity circumscribed
ate and occasionally outlandish crite- by the frailty of human judgment);
ria. We should at least try to base it on openly specified; accepted by the tar-
metrics acceptable to the profession. get scientific community. While other
In discussions with computer sci- disciplines often participate in evalua-
entists from around the world, this tions, it is not acceptable to impose cri-
risk of deciding careers through dis- Our focus is evaluation of individu- teria from one discipline on another.
torted instruments comes out as a als rather than departments or labo-
top concern. In the U.S. it is mitigat- ratories. The process often involves Computer Science
ed by the influence of the Computing many criteria, whose importance var- Computer science concerns itself with
Research Association’s 1999 “best ies with institutions: grants, number the representation and processing of
practices” report.a In many other of Ph.D.s and where they went, com- information using algorithmic tech-
countries, computer scientists must munity recognition such as keynotes niques. (In Europe the more common
repeatedly explain the specificity of at prestigious conferences, best pa- term is Informatics, covering a slightly
their discipline to colleagues from per and other awards, editorial board broader scope.) CS research includes
other areas, for example in hiring and memberships. We mostly consider a two main flavors, not mutually exclu-
promotion committees. Even in the particular criterion that always plays sive: Theory, developing models of
U.S., the CRA report, which predates an important role: publications. computations, programs, languages;
widespread use of citation databases Systems, building software artifacts and
and indexes, is no longer sufficient. Research Evaluation assessing their properties. In addition,
Research is a competitive endeavor. domain-specific research addresses
Researchers are accustomed to con- specifics of information and comput-
a For this and other references, and the source stant assessment: any work submit- ing for particular application areas.
of the data behind the results, see an expand-
ed version of this column at http://se.ethz.
ted—even, sometimes, invited—is CS research often combines aspects
ch/~meyer/publications/cacm/research_eval- of engineering and natural sciences as
uation.pdf. b See http://www.informatics-europe.org. well as mathematics. This diversity is
part of the discipline’s attraction, but scientists will cite Knuth’s The Art of
also complicates evaluation. Computer Programming. Seminal con-
Across these variants, CS research cepts such as Design Patterns first be-
exhibits distinctive characteristics, came known through books.
captured by seminal concepts: algo- 2. A distinctive feature of CS pub-
rithm, computability, complexity, lication is the importance of selective
specification/implementation duali- conferences and books. Journals do not
ty, recursion, fixpoint, scale, function/ necessarily carry more prestige.
data duality, static/dynamic duality,
modeling, interaction…Not all scien-
Publications are not the only sci-
entific contributions. Sometimes
tists from other disciplines realize the
existence of this corpus. Computer
the best way to demonstrate value is
through software or other artifacts.
scientists are responsible for enforc-
ing its role as basis for evaluation:
The Google success story involves a
fixpoint algorithm: Page Rank, which
Bibliometry
In assessment discussions, numbers
typically beat no numbers; hence the
temptation to reduce evaluations to
A P R IL 2 00 9 | VO L. 52 | N O. 4 | C OM M U N IC AT I ON S O F THE ACM 33
viewpoints
tion databases that do not adequately putes these indexes from Google Schol-
cover CS, such as Thomson Scientific’s ar data. Such indexes cannot be more
ISI Web of Science. Our focus is credible than the underlying databas-
The principal problem is what ISI evaluation of es; results should always be checked
counts. Many CS conferences and most manually for context and possible dis-
books are not listed; conversely, some individuals rather tortions. It would be as counterproduc-
publications are included indiscrimi- than departments tive to reject these techniques as to use
nately. The results make computer them blindly to get definitive research-
scientists cringe.d Niklaus Wirth, Tur- or laboratories. er assessments. There is no substitute
ing Award winner, appears for minor for a careful process involving comple-
papers from indexed publications, mentary sources such as peer review.
not his seminal 1970 Pascal report.
Knuth’s milestone book series, with an Assessing Assessment
astounding 15,000 citations in Google Scientists are taught rigor: submit
Scholar, does not figure. Neither do common with the ISI list. any hypothesis to scrutiny, any experi-
Knuth’s three articles most frequently ISI’s “highly cited researchers” list ment to duplication, any theorem to
cited according to Google. includes many prestigious computer independent proof. They naturally
Evidence of ISI’s shortcomings for scientists but leaves out such iconic assume that processes affecting their
CS is “internal coverage”: the percent- names as Wirth, Parnas, Knuth and all careers will be subjected to similar
age of citations of a publication in the the 10 2000–2006 Turing Award winners standards. Just as they do not expect,
same database. ISI’s internal cover- except one. Since ISI’s process provides in arguing with a Ph.D. student, to im-
age, over 80% for physics or chemistry, no clear role for community assessment, pose a scientifically flawed view on the
is only 38% for CS. the situation is unlikely to improve. sole basis of seniority, so will they not
Another example is Springer’s Lec- The inevitable deficiencies of alter- let management impose a flawed eval-
ture Notes in Computer Science, which natives pale in consideration: uation mechanism on the sole basis of
ISI classified until 2006 as a journal. 9. In assessing publications and cita- authority:
A great resource, LNCS provides fast tions, ISI Web of Science is inadequate for 10. Assessment criteria must them-
publication of conference proceed- most of CS and must not be used. Alterna- selves undergo assessment and revision.
ings and reports. Lumping all into a tives include Google Scholar, CiteSeer, Openness and self-improvement
single “journal” category was absurd, and (potentially) ACM’s Digital Library. are the price to pay to ensure a success-
especially since ISI omits top non- Anyone in charge of assessment ful process, endorsed by the commu-
LNCS conferences: should know that attempts to use ISI nity. This observation is representative
! The International Conference on for CS will cause massive opposition of our more general conclusion. Nega-
Software Engineering (ICSE), the top and may lead to outright rejection tive reactions to new assessment tech-
conference in a field that has its own of any numerical criteria, including niques deserve consideration. They
ISI category, is not indexed. more reasonable ones. are not rejections of assessment per
! An LNCS-published workshop at se but calls for a professional, rational
ICSE, where authors would typically Assessment Formulae approach. The bad news is that there is
try out ideas not yet ready for ICSE A recent trend is to rely on numeri- no easy formula; no tool will deliver a
submission, was indexed. cal measures of impact, derived from magic number defining the measure of
ISI indexes SIGPLAN Notices, an citation databases, especially the h- a researcher. The good news is that we
unrefereed publication devoting or- index, the highest n such that C (n) ≥ have ever more instruments at our dis-
dinary issues to notes and letters and n, where C (n) is the citation count of posal, which taken together can help
special issues to proceedings of such the author’s n-th ranked publication. form a truthful picture of CS research
conferences as POPL. POPL papers ap- Variants exist: effectiveness. Their use should under-
pear in ISI—on the same footing as a ! The individual h-index divides the go the same scrutiny that we apply to
reader’s note in a regular issue. h-index by the number of authors, bet- our work as scientists.
The database has little understand- ter reflecting individual contributions.
ing of CS. Its 50 most cited CS refer- ! The g-index, highest n such that the Bertrand Meyer (Bertrand.Meyer@inf.ethz.ch) is
ences include “Chemometrics in food top n publications received (together) a professor of software engineering at ETH Zurich,
the Swiss Federal Institute of Technology, and chief
science,” from a “Chemometrics and at least n2 citations, corrects another architect of Eiffel Software, Santa Barbara, CA.
Intelligent Laboratory Systems” jour- h-index deficiency: not recognizing
Christine Choppy (Christine.Choppy@lipn.univ-paris13.
nal. Many CS entries are not recogniz- extremely influential publications. (If fr) is a professor of computer science at Université Paris
able as milestone contributions. The your second most cited work has 100 XIII and member of LIPN (Laboratoire d’Informatique
de Paris Nord), France.
cruelest comparison is with CiteSeer, citations, the h-index does not care
Jørgen Staunstrup (jst@itu.dk) is provost of the IT
whose Most Cited list includes many whether the first has 101 or 15,000.) University in Copenhagen, Denmark.
publications familiar to all computer The “Publish or Perish” sitee com-
Jan van Leeuwen (jan@cs.uu.nl) is a professor of
scientists; it has not a single entry in computer science at Utrecht University, The Netherlands.
e See http://www.harzing.com/resources.htm#/
d All ISI searches as of mid-2008. pop.htm. Copyright held by author.