07 September 2022

Cheating

‘On the Efficacy of Online Proctoring using Proctorio’ by Laura Bergmans, Nacir Bouali, Marloes Luttikhuis and Arend Rensink in Proceedings of the 13th International Conference on Computer Supported Education 1 (CSEDU 2021) 279-290 comments 

In this paper we report on the outcome of a controlled experiment using one of the widely available and used online proctoring systems, Proctorio. The system uses an AI-based algorithm to automatically flag suspicious behaviour, which can then be checked by a human agent. The experiment involved 30 students, 6 of which were asked to cheat in various ways, while 5 others were asked to behave nervously but make the test honestly. This took place in the context of a Computer Science programme, so the technical competence of the students in using and abusing the system can be considered far above average. 

The most important findings were that none of the cheating students were flagged by Proctorio, whereas only one (out of 6) was caught out by an independent check by a human agent. The sensitivity of Proctorio, based on this experience, should therefore be put at very close to zero. On the positive side, the students found (on the whole) the system easy to set up and work with, and believed (in the majority) that the use of online proctoring per se would act as a deterrent to cheating. 

The use of online proctoring is therefore best compared to taking a placebo: it has some positive influence, not because it works but because people believe that it works, or that it might work. In practice however, before adopting this solution, policy makers would do well to balance the cost of deploying it (which can be considerable) against the marginal benefits of this placebo effect. 

The authors state

 All over the world, schools and universities have had to adapt their study programmes to be conducted purely online, because of the conditions imposed by the COVID-19 pandemic. The University of Twente is no exception: from mid-March to the end of August, no teaching-related activities (involving groups) were allowed on-campus. 

Where online teaching has worked at least reasonably well, in that we have by and by found effective ways to organise instruction, tutorials, labs and projects using online means, the same cannot be said for the testing part of the programme. Traditionally, we test our students using a mix of group project work and individual written tests. The latter range from closed-book multiple choice tests to open-book tests with quite wide-ranging, open questions. Such tests are (traditionally) always taken in a controlled setting, where the students are collected in a room for a fixed period, at the start of which they are given their question sheet and at the end of which they hand in their answers. During that period, a certain number of invigilators (in other institutions called proctors) are present to observe the students’ behaviour so as to deter them from cheating — defined as any attempt to answer the questions through other means than those intended and proscribed by the teacher. This system for testing is, we believe, widespread (if not ubiquitous) in education. 

Changing from such a controlled setting to online testing obviously opens up many more opportunities for cheating. It is hard to exaggerate the long-term threat that this poses to our educational system: without reliable testing, the level of our students cannot be assessed and a university (or any other) diploma essentially becomes worthless. We have to do more than just have students make write the test online and hope for the best. 

Solutions may be sought in many different directions, ranging from changing the nature of the test altogether (from a written test to some other form, such as a take-home or oral test), to offering multiple or randomised versions to different students, or applying plagiarism checks to the answers, or calling upon the morality of the students and having them sign a pledge of good faith; or any combination of the above. All of these have their pros and cons. In this paper, rather than comparing or combining these measures, we concentrate on one particular solution that has found widespread adoption: that of online proctoring. In particular, we describe an experiment in using one of the three systems for online proctoring that have been recommended in the quickscan (see (Quickscan SURF, 2020)) by SURF, a “collaborative organisation for ICT in Dutch education and research” of which all public Dutch institutes of higher education are members. 

Approach. 

Online proctoring refers to the principle of remotely monitoring the actions of a student while she is taking a test, with the idea of detecting behaviour that suggests fraud. The monitoring consists of using camera, microphone and typically some degree of control over the computer of the student. The detection can be done by a human being (the proctor, also called invigilator in other parts of the Anglosaxon world), or it can be done through some AI-based algorithm — or a combination of both. 

The question we set out to answer in this paper is: how well does it work? In other words, is online proctoring a good way to detect actual cheating, without accusing honest students — in more formal terms: is it both sensitive and specific? How do students experience the use of proctoring? 

In answering this question, we have limited ourselves to a single proctoring system, Proctorio, which is one of the three SURF-approved systems of (Quickscan SURF, 2020). The main reason for selecting Proctorio is the usability of the system; it is possible to use it on the majority of operating systems by installing a Google Chrome extension and it can be used for large groups of students. It features automatic detection of behaviour deemed suspicious in a number of categories, ranging from hand and eye movement to computer usage or sound. The teacher can select the categories she wants to take into account, as well as the sensitivity level at which the behaviour is flagged as suspicious, at any point during the proceedings (before, during or after the test). Proctorio outputs an annotated real-time recording for each student, which can be separately checked by the teacher so that the system’s suspicions can be confirmed or negated. The system is described in some detail in Section 2. 

Using Proctorio, we have conducted a controlled randomized trial involving 30 students taking a test specifically set for this experiment. The students were volunteers and were hired for their efforts; their results on the test did not matter to the experiment in any way. The subject of the test was a first-year course that they had taken in the past, meaning that the nature of the questions and the expected kind of answers were familiar. Six out of the 30 students were asked to cheat during the test, in ways to be devised by themelves, so as to fool the online proctor; the rest behaved honestly. Moreover, out of the 24 honest students, five were asked to act nervously; in this way we wanted to try and elicit false positives from the system. 

Besides Proctorio’s capabilities for automatic analysis, we also conducted a human scan of the (annotated) videos, by staff unaware of the role of the students (but aware of the initial findings of Proctorio). We expected that humans would be better than the AI-based algorithm in detecting certain behaviours as cheating, but worse in maintaining a sufficient and even level of attention during the tedious task of monitoring. 

Findings. 

Summarising, our main findings were: The automatic analysis of Proctorio detected none of the cheating students; the human reviewers detected 1 (out of 6). Thus, the percentage of false negatives was very large, pointing to a very low sensitivity of online proctoring. 

None of the honest students were flagged as suspicious by Proctorio, whereas one was suspected by the human reviewer. Thus, the percentage of false positives was zero for the automatic detec- tion, and 4% for the human analysis, pointing to a relatively high specificity achievable by online proctoring (which, however, is quite useless in the light of the disastrous sensitivity). Furthermore, we gained valuable insights into the conditions necessary to make online proctoring an acceptable measure in the opinion of the participating students. 

The outcome of the experiment is presented in more detail in Section 3, and discussed in Section 4 (including threats to validity). After discussing related work (Section 5), in Section 6 we draw some conclusions.

'Cheating in online courses: Evidence from online proctoring' by Seife Dendira and Stockton Maxwell in (2020) 2 Computers in Human Behavior Reports 100033 comments 

This study revives the unsettled debate on the extent of academic dishonesty in online courses. It takes advantage of a quasi experiment in which online proctoring using a webcam recording software was introduced for high-stakes exams in two online courses. Each course remained the same in its structure, content and assessments before and after the introduction of online proctoring. Analysis of exam scores shows that online proctoring was associated with a decrease in average performance in both courses. Furthermore, the decrease in scores persists when accounting for potential confounding factors in a regression framework. Finally, in separate regressions of exam performance on student characteristics, the regression explanatory power was higher for scores under proctoring. We interpret these results as evidence that cheating took place in the online courses prior to proctoring. The results also imply that online proctoring is an effective tool to mitigate academic dishonesty in online courses. 

 The authors state 

In the past two decades, higher education institutions have experienced unprecedented growth in online learning. In the U.S., where this study took place, enrollment in distance higher education grew steadily between 2002 and 2016. Since 2012, whereas overall enrollment in higher education has been declining, growth in distance education has in fact been rising. As of 2016, the latest year for which data are published, close to a third of all college students were taking at least one distance course (Seaman et al., 2018). 

More than half of these distance learners were students that were combining non-distance (face-to-face, F2F) learning with distance learning. Accordingly, today many “traditional” institutions offer a menu of online courses as well as fully online programs. This is prompted by sustained demand for such courses and programs – in 2016, for example, about 30 percent of students in public and private non-profit institutions in the U.S. enrolled in at least one distance learning course (source: own calculation using data in Seaman et al., 2018). It also appears that educators in all types of institutions have recognized that a structural shift has occurred, and that online delivery and learning will be a mainstay of higher education in the future. 

Therefore, the dialogue surrounding online education has turned to how best to deliver online courses. Various aspects of online courses, such as modality (fully online versus hybrid; synchronous versus asynchronous), technology platform, assessment and accessibility are considered and debated. The goal of such dialogue, ultimately, is to design and deliver online courses in which student learning and experience are at least on par with traditional (F2F) courses. Given this goal, the question of how much learning takes place in online courses (relative to the traditional/F2F mode) has become a critical point of contention (see, among others, Cavanaugh & Jacquemin, 2015; Alpert et al., 2016; Dendir, 2019; Paul & Jefferson, 2019). 

A particularly pertinent issue in this regard is academic dishonesty (McCabe et al., 2012). Some argue that even the measures that are used to gauge learning in online courses, such as scores on formative or summative assessments, do not truly reflect learning because they are possibly tainted by cheating that occurs during these assessments (Harmon et al., 2010; Arnold, 2016). If, for example, exam score distributions turn out to be comparable in an online course and its F2F counterpart, it does not mean that comparable learning takes place in the two modes simply because the online scores are likely inflated by cheating.1 Such arguments are predicated on the assumption that academic dishonesty is more prevalent in online courses than F2F ones (Kennedy et al., 2000; Young, 2012). 

Various arguments are provided as to why online courses could be more amenable to academic dishonesty. One is that because assessments often happen in unsupervised or unproctored settings, it is difficult to confirm the identity of the test taker (Kraglund-Gauthier & Young, 2012). Similarly, online test takers can use unauthorized resources (e.g. cheat sheets, books or online materials) during assessments. Also, the online environment – by the mere absence of a close relationship and interaction with an instructor – can encourage collaborative (group) work with other students (Sendag et al., 2012; McGee, 2013; Hearn Moore et al., 2017). 

While there is also growing empirical evidence showing that academic dishonesty is relatively more common in online learning, the debate is not yet fully settled (Harton et al., 2019; Peled et al., 2019). It is in this context that the current study presents evidence from a quasi/natural experiment that occurred in two online courses at a midsize comprehensive university in the U.S. The experiment involved the introduction of online proctoring using a webcam recording software for high-stakes exams. The structure, content and assessments (exams) in each course remained the same before and after the introduction of online proctoring. A change in student performance, if any, can therefore be attributed to the mitigation of cheating after online proctoring came into place, and provides direct evidence on the scale of academic dishonesty in online courses. 

Relative to much of the existing literature, the treatment here is unique because proctoring did not entail a change in modality. Many studies that investigate academic dishonesty typically compare student performance in unproctored online assessments and proctored F2F ones. But a comparison of student performance in proctored F2F exams and unproctored online exams may not be entirely valid because some of the performance differences could be due to the testing environment per se, apart from the effect of supervision (Fask et al., 2014; Butler-Henderson & Crawford, 2020). By comparing performance in the same learning mode (online) but before and after the advent of supervision, this study avoids any such complications. Furthermore, from a practical point of view, in many scenarios in-person proctoring of tests may not be feasible for fully online courses. Therefore, the results of the current study also provide evidence on the efficacy of easily adoptable, relatively low-cost online proctoring in online courses. 

The findings of the study suggest that cheating was taking place in the unsupervised exams. First, simple bivariate analyses show that there was a significant drop in average exam scores in both courses after online proctoring was introduced, in many cases by more than a letter grade. This is despite the fact that student characteristics remained largely similar before and after proctoring (implying sample selection was unlikely to be a factor). Second, explicitly accounting for student characteristics in a multiple regression framework could not explain away the decrease in performance. Finally, a comparison of the explanatory powers of regressions of scores on student ability and maturity indicators showed that they were higher for proctored exams. From these results one can also infer that online proctoring of assessments is a viable strategy to mitigate cheating in online courses. 

The balance of the paper is organized as follows. The next section reviews the related literature on academic dishonesty. Section 3 describes the setup of the study and data. Section 4 presents bivariate analysis, the regression methodology and results. Section 5 points out some caveats and limitations of the study. The last section concludes and draws a few implications on the basis of the results of the study.