Professional Documents
Culture Documents
lectures and one 170-minute lab period per week, three exams (two that was initiated by a programming question. This resulted in a
midterms and a final), and seven individual programming final sample of 93 threads.
assignments. The course enrolled 140 students, 129 of whom For each of the 93 threads of social activity in our sample, we
finished the course and received a grade. 108 of these students (100 carried out the analyses described in Table 1 in order to provide a
men, 8 women) consented to releasing their learning process data foundation for addressing our research questions.
(automatically collected within the SPE) and course grades data for
this study. In order to categorize question content (#1 in Table 1), we extended
a previously used content coding scheme (see [13]), which we
3.2 Data Collection, Sampling, and Analysis present in Table 2. To verify the validity of this coding scheme for
Method the present analysis, the two authors independently coded a 20%
sample of the corpus (n = 19 posts), attaining an overall agreement
Our SPE (see [8]) was used to collect students' online social activity
of 88% (0.83 kappa). Having established a sufficiently high level
and programming behaviors. The SPE collected 21,952,494 points
of inter-rater reliability for this coding scheme, the first author
of interaction and 10,720 discussion posts. In an attempt to make
coded the remaining posts.
the data manageable, we employed a principled approach to
sampling the content. To this end, we randomly sampled 10
students who received each of the five possible course grades (A,
B, C, D, and F). While we were able to sample 10 students from Table 1. Analyses of Programming Questions
each of the A, B, and C levels, only four students were available
# Description of Analysis Relevant RQs
who received D's in the course, and only two students were
available who received F's. Therefore, our sample could include 1 Categorize question content RQ 1
only 36 students, instead of the 50 that we targeted. 2 Determine whether question related to question RQ 2, RQ3
author’s current build state
Having chosen our sample, we next identified the 2,352 instances 3 Determine whether responses to question were RQ 2, RQ3
in which students in our sample were involved in some kind of posted
social activity—either a post or reply on the SPE's activity stream. 4 Determine whether programming suggestions to RQ 2, RQ3
Since our research questions related to the interplay of social and address question were posted
programming behaviors, we opted to focus only on social activities 5 Determine whether question author explicitly RQ 2
(posts and replies) that had to do with programming activities. This acknowledged suggestions (with, e.g., “thank
refinement yielded 461 posts. Finally, in order to focus our study you”)
even more intently on our research questions, we further pruned the 6 Determine whether question author resolved the RQ 2
sample to include only posts and replies that were part of a thread question in a future build
Table 2. Programming Question Content Categories In order to increase the likelihood that any progress made by
question authors was influenced by their interaction in the
Category Description Example
corresponding activity stream thread, we considered subsequent
COMPILE Question relates to an “After I debug my builds that occurred within a reasonable time—two days—of the
issue encountered code, I got this error,
during program question author's last post to the thread. However, given that some
can anyone explain it to
compilation. me? thank you Error 1 students compiled their code infrequently, we required that a
error LNK2019: minimum of five compilations be considered. In some cases,
unresolved external examining a minimum of five compilations led us to consider
symbol […]”
compilations beyond the two-day window.
IDE Question relates to the “Okay, does anybody else
operation of Visual consistently get the
Two caveats come with the analysis approach just described. First,
Studio problem where cin and because it requires evidence that (a) programming questions be
cout are underlined with related to programming context, and (b) programming progress fall
red and VS gives you the within a reasonable time after related programming questions are
error "cout is
ambiguous"?” answered, our analysis approach may be seen as overly
conservative. For example, it would have been possible for students
IMPLEMENTATION Question asks for tips “If both the player and
on how to best to have asked a related coding question before they started coding
the computer draw the
implement an algorithm same type of hand(say their solution. Likewise, it would have been possible for students
or function. This is two pairs), who wins?” to have made positive strides towards a correct solution outside of
often, but not always,
related to the
our two-day, five-build window. Thus, it is possible that our
requirements of a given analysis approach failed to identify some relationships between
lab or assignment. programming and social behavior.
LANGUAGE Question asks about the “Does anyone know if the Second, just because we are able to find evidence of positive
C/C++ language, or strtok() keeps the old
about a programming string or does it fully erase programming progress that closely follows related social activity,
issue related to the it?” this does not mean that such progress was made because of the
misunderstanding of social activity; indeed, such an assumption would fall prey to the
syntax.
post hoc, ergo proctor hoc fallacy. Clearly, our study is
RESOURCES Question requests What did you guys use correlational, not causal. These two caveats should be borne in
external programming to create your UML
resources or tips. mind in interpreting the results that follow.
diagrams?