You are on page 1of 5

Effectiveness of Online Assessment

Dr. Denise Woit Dr. David Mason


Ryerson University
Toronto, Ontario
{dwoit,dmason}@ryerson.ca
Sheard’s recent study [12] of 287 undergraduate
Abstract students—137 in Computer Science and Software
For five academic years we have engaged in an on -going Engineering at Monash University, and 150 in
study of the effectiveness of online assessment of student Information Technology at Swinburn University —
programming abilities for introductory programming students report it accepta ble to copy a majority of an
courses in Computer Science. Our results show that online assignment from a friend, to submit a friend’s assignment
evaluation can be implemented securely, efficiently, and can from a past running of the course, to hire a person to write
result in increased student motivation and programming an assignment for them, or to resubmit an assignment
efficacy; however, unless onli ne components are integrated from a previous subject in a new subject. More
throughout the course evaluations, student competence will specifically, 34%/28% said they had personally copied the
be underestimated. Our data reveals disadvantages of online majority of an assignment from a friend, and 53%/38%
evaluations, but also shows that both students and faculty said they personally knew someone who did this. They
benefit when online evaluations are implemented reported 52%/42% personally collaborated on an
appropriately. assignment meant to be completed individually, and
75%/63% personally knew someone who did this. Also,
1. Introduction 77/79% said they would do nothing if they observed
another student cheating on an assignment. Results from
We were motivated to begin online evaluation of our this survey support our own anecdotal evidence and study
students because we hoped to improve student programming data.
efficacy and measure it more accurately. Through
discussions with senior students, T.A.s and other instructors, Sheard [12] found their results “b roadly in agreement”
and through our personal observations that actual with similar studies. For example, a study [11] of 791
programming skills of some students were much poorer than undergraduates found that 90% would not report cheating
predicted by their course work scores —especially by others, and in a more recent study [3], 49% -86%
assignment and laboratory exercises—we became concerned agreed with this notion. Several studies report a high
that copying of these exercises was a significa nt problem number of studen ts who admit to cheating. A study of
among our first year students. It appeared plausible that 422 students [10] reports 92% admit to being involved in
students could obtain excellent assignment and laboratory incidents of “academic misconduct”, while in studies
marks through little or no practical work of their own, and [9,5] of 943 and 500 undergraduates, 88% and 90%,
could manage to concatenate enough memorized code respectively, admit to cheating. In another study [6] of
fragments on writte n examinations to achieve a “good” 380 undergraduates, 54% reported they cheated, while
grade in the course through part -marks. We sought the only 1% reported being caught cheating.
ability to clearly identify those students who had not
engaged in the amount of practical course work we Since we began our online evaluation studies, a number of
expected. Furthermore, we hoped that our improved other educators have reported similar studies, although on
evaluation techniques would motivate students to participate a smaller scale. Chamillard [2] reports incorpo rating 2
more earnestly in the practical components of the course. online “lab practica” into course work for approximately
500 students enrolled in an introductory Computer
Science course at the U.S. Airforce Academy. Califf [1]
reports instituting an online “laboratory exam” in CS1
2. Related Work over two years, with 200 -280 students per year. English
describes an end -of-year online exam for 64 students in a
Our speculation and anecdotal evidence concerning cheating
on assignment and laboratory work among our students is in first year programming course [4]. In the Chamillard and
keeping with past and recent international studies. In Califf studies, the online tests were aimed at assessing
only students’ practical skills, and were given in addit ion
to the customary course tests and exam. In the English
study, the online evaluation constituted the course final
exam, and contained multiple -choice and short -answer
questions in addition to programming questions. As
described more fully in the sequel , these studies generally
support our own findings about the effectiveness online
evaluations; they also hint at the weaknesses of online
evaluations revealed by our data.
Study Implementation laboratory to be marked, so that a total of 6 submissions
Over a total of five academic years we have studied were made over the term. Student survey results show that
incorporating online components into our first year language students generally did not attempt the voluntary
course. This course is administered alongside CS2, and its laboratory exercises, but worked merely on the questions
main objective is to impart practical skills in Unix shell that were to be marked. Although the voluntary
usage, in Unix shell programming, in the procedural laboratory exercises accounted for 90% of assigned
paradigm, and in the C language, all of which are essential coursework, the majority of students reported allocating
in future courses. When our study began, approximately 150 them just 0 -25% of the total time they spent working on
students were enrolled in this course; the enrolment has laboratory exercises.
increased over the years to approximately 210 in the most Evidence also suggests that a significant amount of
recent year. copying occurred. Student survey results show that for
We employ a multiple -case study design, with each trial is marked laboratory exer cises, 87% of students, at some
studied and analyzed independently, as well as in relation to point, gave their completed questions to other students
the others. The studies each include an online final exam; before the due date, while 72% had used other students’
however, they vary in whether laboratory assignments are completed questions to help them with their own. In fact,
marked or voluntary, and in the frequency of other online 48% of students reported giving away their assignments
evaluations, which ranged all the way from having no other frequently, and 38% reported using others’ assignments
online components at all to having online weekly quizzes frequently. Incredibly, students reported that online
worth a majority of course marks. We assessed our results in evaluations motivated them to cheat less on their
relation to student efficacy, student stress, student submitted laboratory questions.
motivation, instructor evaluation time and satisfaction. Further evidence of the copying problem lies in the
We differentiate among our four individual case studies comparison between student laboratory grades and final
according to their configurations of evaluations and exam grades, shown in Figure 2. Since the final exam
laboratory assignments as follows: questions were taken directly from the marked laboratory
exercises (some with slight modification), it is reasonable
-Partially marked laboratories, no midterm. (PML-N) to expect similar grades for the final exa m and the marked
-Voluntary laboratories with online midterm, (VL-M) laboratory questions. However, as shown in Figure 2,
-Marked laboratories with online midterm, (ML-M) 85% of students received A -level grades and none failed
-Online weekly laboratory quizzes, no midterm, (WQ-N) in the marked laboratory component, whereas in the final
exam only 21% of students obtained A -level grades and
30% failed.
The re sults from Figure 1 show this study has the largest
number of failures and D -level grades, and the smallest
number of A -level grades. Clearly, this study was not
successful from the point of enhancing student efficacy
and motivation in the practical components of the course.

Figure 1. Online final exam marks


3. Study Results
Student online final exam marks in each study, depicted in
Figure 1, form a starting point for our comparative analysis;
we also evaluate data from a variety of other sources: from Figure 2. Final exam vs. laboratory marks (PML-N)
other quantitative benchmarks of student performance, from
student and instructor surveys, and from student, T.A., and 3.2 Voluntary Laboratories with Midterm (VL-M)
instructor anecdotal evidence. We incorporate results from The next -to-worst results occurred when laboratory
similar studies where appropriate [1, 2, 4]. exercises were completely voluntary, but an online
midterm w as included (VL -M). Evidence from student
3.1 Partially Marked Laboratories With No Midterm (PML-N) and T.A. surveys suggests students did not devote time to
the voluntary laboratory exercises and thus were very
The most dismal performance occurred with partially
unprepared for the online midterm test. Figure 3 shows
marked laboratory exercises and no midterm. In this s tudy
the distribution of online midterm marks for this study
students were given laboratory exercises to work on each
compared to the online final exam marks. A majority of
week, with T.A.s available during their regularly scheduled
students report their poor performance on the midterm
laboratory sessions. We selected a portion of every second
subsequently motivated them to learn enough practical reduced but not eliminated by online evaluations, and this
course material to achieve a higher mark in the online final was clearly evident among the marginal and failing
exam. A majority also report that familiarity with the online students. In fact, in ML -M, of those students who
testing process, obtained via the midterm, also helped them received F-level or D-level grades on the final exam, 77%
achieve a higher mark on the final exam. had achieve d an A -level grade on marked laboratory
exercises, 17% a B -level grade, and 6% a C -level grade.
Not surprisingly, these were the students who felt least
that online evaluations motivated them to stop cheating.
We hoped to target these marginal students wi th our WQ-
N study.

3.4 Online Weekly Laboratory Quizzes With No


Midterm (WQ-N)
This study included 12 online quizzes (one per week) as
well as the online final exam. Laboratory exercises were
voluntary; however, at the end of each week, portions of
the laboratory exercises for that week were selected for an
online quiz. Because the scheduled laboratory time was
devoted to the online quiz each week, we scheduled 12
hours per week of free in -laboratory T.A. consulting time.
A more detailed description of th is study is presented in
Figure 3. Midterm vs. final exam marks (VL-M) [14].
It is interesting to compare the midterm marks of these Given our previous success with enhancing motivation by
students (VL-M) with the final exam marks of the students employing online evaluations, we predicted even more
in the “worst” group (PML -N), since these are the first success with this study. We expected that students would
online evaluations of the term for each group. Both marks be motivated to complete and understand the weekly
are generally poor; however, students in VL -M got a laboratory exercises, and to utilize the scheduled T.A.
“second chance” b ecause this was the first of their two hours. We suspected that perhaps students would not
online evaluations. immediately realize that doing and understanding the
laboratory work was essential, but that after failing one or
Comparing data from these two studies (PML -N and VL - two initial quizzes, they would reform an d begin the
M), we conclude that there is little difference between diligent laboratory work we expected. We were wrong.
having completely voluntary and mostly voluntary
laboratory exercises, because our studen ts tend to ignore Most students failed miserably on the first four quizzes;
voluntary assignments. Our results, and corroborating on quizzes 5 and 6 about half were passing, but with very
results from Califf [1] and English [4], show a limitation of poor marks. On the midterm quiz, the average grade was
online testing is that students will tend to perform poorly on in the D -range. Evi dence showed that students had not
the first online evaluation. Therefore, it is essential to have been working on their laboratory exercises at all. We
at least one online evaluation prior to the final exam because obtained this evidence by asking concerned students
it is this evaluation that motivates students to take the submit all of their “rough” laboratory work, and finding
practical aspects of the course more seriously and provides none had any work to submit. T.A records showed only a
them experience working in the online testing environment. handful of students had used the consulting service during
the first half of the term.
3.3 Marked Laboratories with Midterm (ML-M) In order to motivate students to begin working on their
The second -best student performance occurred when all laboratory assignments, we decided to count only the last
laboratory exercises were marked, and an online midterm 6 quizzes toward their final mark. The students worked
test was given. Figure 1 shows that 40% of students diligently in the second half of the term, heavily utilizing
achieved a final exam mark in the A -range, and 22% failed. T.A. support. Only 7% of students obtained an overall
Final exam grades in this study were a marked improvement failing grade in the quizzes, and 42% achieved an A -level
over those of PML -N and VL -M. Comparing these three grade. Figure 4 shows students’ overall quiz grades as
studies, we note that even though students may cheat, it well as their final exam grade s. We believe quiz grades
appears preferable to require practical work to be submitted are a good indication of students’ practical knowledge
frequently, as opposed to rarely or not at all. because of the high correlation between quiz marks and
final exam marks.
Students and instructors were pleased with the results of this
study, but the instructors were not satisfied. We noted that In Figure 1, the final exam marks in this WQ -N study are
with online midterm and final evaluations, a substantial compared to those of the other studies. Note that the
number of students, 30% -40%, now achieved marks in the implementation of online weekly quizzes dramatically
A-range, but the same number also failed the course or reduced failure rates. Students who would have otherwise
achieved D -range marks. We found it especially failed moved into the C - and D -ranges. Students from the
discouraging that the failing and marginal students often did D- and C -ranges moved into the A - and B -ranges. In the
poorly on exam questions that were taken directly from their WQ-N study, students ach ieved a mark approximately
laboratory exercises. Survey r esults showed cheating was one full letter grade above those in other studies, and
because of our secure, online evaluations, we were certain Even if students do have access to solutions, it does not
that these students deserved their grades. When all 4 studies appear that this results in higher test scores. The most
are compared, it appears unnecessary to require stud ents to recent study was PML -N, in which students showed the
submit laboratory assignments for grading when weekly lowest overall performance of all studies. The second -last
online quizzes of laboratory content are employed. study was WQ -N, in which students showed even lower
However, lacking such frequent online evaluations, it is performance than PML -N in the first half of the term.
preferable to require frequent submission of laboratory Student performance, however, did increase to the overall
assignments. best of the studies in the last half of the term. It is clear
that this resulted from students working on the exercises
themselves; it is unclear whether or not these students did
have solutions and, if so, the extent to which this
contributed to their ability to do the exercis es themselves.
We plan further study in this area.
Through anecdotal evidence and student surveys, we
found that student stress was a significant factor in our
studies. Both Califf and Chamillard [ 1,2] noted this as
well. Chamillard reports that in future classes, the number
of online evaluations will be increased from 2 to 3 partly
to reduce the “stress associated with taking each
practicum.” In the first year of the Califf study, they
found that students “felt major stress taking the exam”
because student s had to write complete programs from
Figure 4. Quiz and final exam marks (WQ-N) scratch and debug them, but they had little experience in
these areas. In the subsequent year, the instructors
modified the laboratory exercises so that these topics were
4. Discussion of Results more thoroughly covered, and instituted severa l short
Results of student surveys from the studies show that quizzes in these areas during the term. The authors report
students believe conventional tests are more likely to pass or students were “much less stressed” in the second year,
assign higher marks to students who lack practical skills in and achieved higher grades in the year-end online test.
the course. They think online tests are a better indicator of
their practical skills, and that tests of these skills could not Our data shows students believe they would feel less
be achieved as well by conventional evaluations. They feel stress on the online fina l exam if they had the “practice”
that student marks on online evaluations are a good of an online midterm. Following this logic, we expect
indication of whether or not students cheated on marked students in our WQ-N study to suffer the least online final
laboratory assignments. A majority report that online tests exam stress, since they had 12 previous online tests that
motivate them not to cheat on marked laboratory term; we expect the most online final exam stress to have
assignments, and motivate them to attend class and occurred in PML-N, when students had no practice before
laboratory sessions. We found a s light positive correlation the online final exam. Survey results from the PML -N
between students’ marks and their motivation not to cheat. study show that a majority of students found our online
The poorer the student mark, the less motivated they feel not final exam more stressful than traditional exams. There
to cheat on marked laboratory exercises, although they are was also a significant neg ative correlation between (a)
still more motivated than with conventional tests . stress felt and (b) believing their mark was higher because
Interestingly, survey results are similar among the various the test was online (vs. traditional). Unfortunately, we
studies. There are no significant correlations between a cannot directly compare this with QW -N because that
student’s course mark and any survey question. survey did not include questions relating to stress.
However, other data is consistent with our expectations:
Student/instructor opinions and test results from our studies Approximately 15% of students in the PML -N study
lead us to conclude that ha ving an online final exam, with at complained to the instructor that they did not work to
least one prior online evaluation, has the advantage of their potential because of the stress of the online test,
bringing students to a higher standard, of reducing the level while there were no complaints in the WQ -N study.
of cheating and copying, and of encouraging students to Anecdotal evidence concurs: In the WQ -N study, none of
attain the practical skills expected of them in the course. Our the students appeared particularly stressed, while in the
results show that students are most likely to attain the PML-N study about 25% of the students appeared
practical skills expected of them when the course contains stressed during the final exam, and about 5% seemed to
frequent, online evaluations. When the only online panic at some point. Our eviden ce, as well as that of
evaluations are the midterm and final exam, students’ Califf and Chamillard [1, 2], implies that in order to
practical skills a re better when laboratory exercises are mitigate the variable of stress in our studies, we must
graded frequently. provide at least one online evaluation before the final one.
Because these studies took place in different years, we
hypothesised that students might perform increasingly well 5. Study Environment
on the laboratory exercises, and thus online tests, as the We attempted to control extraneous variab les in our
years progressed, because they would have access to an ever studies by having similar online testing and laboratory
increasing set of solutions from students of previous years. environments, similar test questions, similar laboratory
assignments, and by having the same instructor mark all [2] Chamillard, A. T. and Joiner, J. K. Using Lab
online evaluations for all studies. Practica to Evaluate Programming Ability. SIGCSE
Bulletin, 33.1 (2001), 159-163.
The online test environment for the WQ -N, ML-M, and VL-
M studies was identical and text -based. In the PML -N [3] Cole, S. and McCabe, D.L., Issues in academic
study we moved to a GUI environment matching the integrity., New Directions for Student Services,
students’ usual setting. The effects of using a matching Jossey-Bass Publishers, (1966), pp. 67-77.
environment are unclear, as environmental results are
conflated with other effect s. Student opinion, however, is [4] English, J. Experience with a computer -assisted
that having the matching environment is preferable. formal programming examination. Proc. 7 th Annual
Conference on Innovation and Technology in
The online test questions varied slightly, but were “the Computer Science Educat ion (2002, Aarhus,
same” in that they were taken from the students’ laboratory Denmark) 51-54.
exercises, some with slight modification. The laborator y
exercises were almost identical in all the studies. [5] Graham, A.M., Monday, J., O’Brien, K. and Steffen,
S., Cheating at small colleges: an examination of
student and faculty attitudes and behaviours, Journal
6. Automatic Evaluation of College Student Development, 35 (1994) 255-260.
Automatic marking in this course is especially valid because
it is simply a practicum course, and we do not grade on [6] Haines, V.J., Diekhoff, G.M., LaBeff, E.E. and Clark,
elements such as style, algorithms, or documentation, as R.E., College cheating: Immaturity, lack of
would a CS1 course. We are concerned mainly with code commitment, and the neutralizing attitude, Research
being configured correctly and producing correct output. In in Higher Education, 25 (1986) 342-354.
the WQ-N study, all of the online quizzes were marked in a [7] Mason, D. and Woit, D. Integrating Technology into
fully automated test harness, while the online final exam Computer Science E xaminations. SIGCSE Bulletin,
was marked manually. We bel ieve the high correlation 30.1 (1998), 140-144.
between these marks, shown in Figure 4, is evidence of the
accuracy of our automatic marking harness. [8] Mason, D., Woit D., Abdullah, A., Barakat, H., Pires,
C. and D’Souza, M. Web -based evaluation for the
Even when marking “manually” the instructor employed a convenience of students, markers, and faculty. Proc.
semi-automated test harness, which executed students’ code. N.A.Web’99 Conference, (October 1999).
When the test harness reported correct output, the code then
required only a brief, visual inspection, which reduced [9] Newstead, S.E., Franklyn -Stokes, A. and Armstead,
overall marking time. Programs that did not pass all tests, or P., Individual differences in student cheating, Journal
did not compile, were marked in the normal fashion, of Educational Psychology, 88 (1996) 229-241.
although online. Other studies with o nline evaluation [10] Roberts, P., Anderson, J. and Yanish, P., Academic
reported similar reductions in marking time [1, 2, 4]. In misconduct: where do we start?, Northern Rocky
other work we more fully report on our methods of online Research Association, Jackson, Wyoming, (1977).
evaluation [7, 13, 14], and on the online marking
environment we developed [8, 15]. [11] Schemo, D.J., Degree of dishonour. The Age ,
Melbourne, 2001, pp. 16.
8. Conclusions [12] Sheard, J., Dick, M., Markham, S., Macdonald, I. and
Students and instructor s agree that online testing of Walsh, M. Cheating and plagiarism: Perceptions and
students’ practical skills provides a more accurate measure practices of first year IT s tudents. Proc. 7 th Annual
of student efficacy. This opinion is supported by the data we Conference on Innovation and Technology in
have collected over five academic years, comparing student Computer Science Education (2002, Aarhus,
performance on online tests in a variety of scenarios. The Denmark) 183-187.
data shows that it is imperative to provide more than one
online evaluation per session; this mitigates the effects of [13] Woit, D. and Mason, D. Lessons from On -line
student stress, it allows the students to “practice” before the Programming Examinations. SIGCSE Bulletin, 30.3
final test, motivates them to acquire the expected skills, and (1998), 157-259.
results in a more realistic score on the final online [14] Woit, D. and Mason, D. Enhancing student learning
evaluation. Students best attain the expected skills when through on -line quizzes. SIGCSE Bulletin, 32.1
many online evaluations are incorporated into the course, (2000a), 367-371.
and in this case, traditional, non -online evaluations are
unnecessary. Instructo rs and students believe that our [15] Woit, D. and Mason, D. Evaluation Methods for In-
online tests hold students to a higher standard and motivate Context, Consistent Comments and Hierarchical
them to strive to achieve a higher level of practical Marking. In Bruce Mann (Ed.), Chapter 13,
competency in our course. Perspectives in web course management. (2000b).
Canadian Scholars’ Press. ISBN 155130-143-1
References
[1] Califf, M.E. and Goodwin, M. Testing skills and
knowledge: Introdu cing a laboratory exam in CS1.
SIGCSE Bulletin, 34.1 (2002), 217-221.

You might also like