Professional Documents
Culture Documents
HF750 SP11
3/24/11
Table of Contents
Introduction: Is Usability Testing Reliable? ................................................................. 2 The optimal number under question .............................................................................. 2 The layering of effects on results...................................................................................... 3 The evaluator effect bias .................................................................................................... 4 Analytical data as a solution ............................................................................................. 4 Evaluation of the Evaluation Literature ........................................................................ 5 Summary and Conclusion.................................................................................................. 5 Works Cited ......................................................................................................................... 6
Armen Chakmakjian
HF750 SP11
3/24/11
In reviewing the pertinent literature, this paper will attempt to explain each of these items and the expert opinions in those areas. Finally the paper will describe the effect of these things on the practitioner and propose some possible solutions.
Armen Chakmakjian
HF750 SP11
3/24/11
from a probabilistic point of view that that magic number and the methodology behind it may have problems due to the independence of individual task scenarios events and that there is an unequal likelihood that individual problems are identified. Lindgaard and Chattratichart point out that that the magic number 5 is relied upon excessively by practitioners (Lindgaard & Chattratichart, 2007). They contend that by concentrating on the number rather than the task coverage, many problems are being missed. They based their study on their own evaluation of the results from the CUE-4 study (Molich & Dumas, 2006), which attempted to compare the results of expert review teams against each other. That particular study and one of its predecessors, CUE-2 (Molich, Ede, Kaasgaard, & Karyukin, 2004), studied the results of consistency of findings between teams and organizations. In both cases they presented showed that 75% of results were unique reported by teams and that only a small number of problems were found the teams involved. As Molich pointed out and Lindgaard later may have inferred, consistency of method and task may be a cure for this.
Armen Chakmakjian
HF750 SP11
3/24/11
have shown that posttest evaluations can also be affected (Jacobsen, Hertzum, & John, 1998). Jacobsen points out that the results of their study questions the use of data from usability tests as a baseline for comparison to other usability evaluation methods. (Jacobsen, Hertzum, & John, 1998).
Armen Chakmakjian
HF750 SP11
3/24/11
Paper 2
Armen Chakmakjian
HF750 SP11
3/24/11
Use some amount of technically objective data such as keystrokes and mouseclicks to create a data stream to correlate to evaluators observations
Expert Reviews and User evaluations are complementary techniques but be prepared for results to vary widely
Works Cited
Virzi, R. A. (1992). Refining the Test Phase of Usability Evaluation: How Many Subjects is Enough? Human Factors , 34, 457-471. Christensen, L., & Frkjr , E. (2010). Distributed Usability Evaluation: Enabling Large-scaleUsability Evaluation with User-controlled Instrumentation. Proceedings: NordiCHI 2010 (pp. 118-127). Reykjavik, Iceland: ACM. Donkers, A. M., Tombaugh, J. W., & Dillon, R. F. (1992). Observer Accuracy in Usability Testing: The Effects of Obviousness and Prior Knowledge of Usability Problems. Carleton University, Department of Psychology, Ottawa, Ontario, Canada. Greenberg, S., & Buxton, B. (2008). Usability Evaluation Considered Harmful (Some of the Time). CHI 2008 Proceedings (pp. 111-121). Florence, Italy: ACM. Hughes, M. (1999, November). Rigor in Usability Testing. Technical Communication (4), pp. 488-494. Hertzum, M., Jacobsen, N. E., & Molich, R. (2002). Usability Inspections by Groups of Specialists: Perceived Agreement in Spite of Disparate Observations. CHI 2002: , (pp. 662-663). Denmark. Jacobsen, N. E., Hertzum, M., & John, B. E. (1998). The Evaluator Effect in Usability Tests. CHI 98 (pp. 255-256). ACM. Law, E. L.-C., & Hvannberg, E. T. (2004). Analysis of Combinatorial User Effect in International Usability Tests. CHI 2004, 6, pp. 9-16. Vienna, Austria. Lindgaard, G., & Chattratichart, J. (2007). Usability Testing: What Have We Overlooked? CHI 2007 Proceedings Usability Evaluation, (pp. 1415-1424). San Jose, CA. Molich, R., & Dumas, J. S. (2006). Comparative Usability Evaluation (CUE-4). Behaviour and Information Technology , Preprint. Molich, R., Ede, M. R., Kaasgaard, K., & Karyukin, B. (2004). Comparative usability evaluation. BEHAVIOUR & INFORMATION TECHNOLOGY , 23 (1), 65-74.
Paper 2