23 May 2023

Reflection

Law teaching often embeds formulaic 'reflective writing.  'Can large language models write reflectively' by Yuheng Li, Lele Sha, Lixiang Yan, Jionghao Lin, Mladen Raković, Kirsten Galbraith, Kayley Lyons, Dragan Gašević and Guanliang Chen in (2023) 4 Computers and Education: Artificial Intelligence comments 

Generative Large Language Models (LLMs) demonstrate impressive results in different writing tasks and have already attracted much attention from researchers and practitioners. However, there is limited research to investigate the capability of generative LLMs for reflective writing. To this end, in the present study, we have extensively reviewed the existing literature and selected 9 representative prompting strategies for ChatGPT – the chatbot based on state-of-art generative LLMs to generate a diverse set of reflective responses, which are combined with student-written reflections. Next, those responses were evaluated by experienced teaching staff following a theory-aligned assessment rubric that was designed to evaluate student-generated reflections in several university-level pharmacy courses. Furthermore, we explored the extent to which Deep Learning classification methods can be utilised to automatically differentiate between reflective responses written by students vs. reflective responses generated by ChatGPT. To this end, we harnessed BERT, a state-of-art Deep Learning classifier, and compared the performance of this classifier to the performance of human evaluators and the AI content detector by OpenAI. Following our extensive experimentation, we found that (i) ChatGPT may be capable of generating high-quality reflective responses in writing assignments administered across different pharmacy courses, (ii) the quality of automatically generated reflective responses was higher in all six assessment criteria than the quality of student-written reflections; and (iii) a domain-specific BERT-based classifier could effectively differentiate between student-written and ChatGPT-generated reflections, greatly surpassing (up to 38% higher across four accuracy metrics) the classification performed by experienced teaching staff and general-domain classifier, even in cases where the testing prompts were not known at the time of model training. ... 

Educators frequently administer reflective writing tasks to elicit students' reflections on their prior learning experiences, within or outside a particular course (Mann et al., 2009). Engagement in this writing task has been shown to promote the development of critical thinking and problem-solving, an important set of skills that can benefit life-long learning (Charon & Hermann, 2012). However, recent advancements in generative language models have raised concerns among educators administering written assignments (Kung et al., 2022, Susnjak, 2022, Yan et al., 2023). For instance, by utilising generative language models to automatically draft their reflective written responses, some students may miss the opportunity to engage in authentic and critical reflections on their own learning experiences. In this way, for example, some students may not be able to evaluate the learning strategies they used in the past, and then modify their learning strategies to ensure more productive learning in the future (Raković et al., 2022). More importantly, the instructors cannot give tailored feedback to improve student learning. 

In particular, ChatGPT, a recently released chatbot based on artificial intelligence (AI), has demonstrated the potential to comprehend different requests from users and, per those requests, generate relevant and insightful texts for different purposes and in different domains, e.g., journal articles (Pavlik, 2023), financial reports (Wenzlaff & Spaeth, 2022), and academic literature reviews (Aydın & Karaarslan, 2022). Despite the promises of ChatGPT to make text generation more efficient, many educators are concerned regarding the potentially detrimental effects of using automatic text generation methods to facilitate student writing, including reflective writing (Kung et al., 2022, Stokel-Walker, 2022, Susnjak, 2022). However, those concerns have not been empirically supported as of yet in the context of reflective writing. More research is thus needed to empirically document and understand the capabilities of cutting-edge text generation methods to generate reflective writing, thus providing educators and researchers with new insights regarding the use of these methods in reflective writing. In addition, it remains unknown whether/how AI-generated writing can be accurately differentiated from students' original work. This may be particularly important for educational stakeholders aiming to identify and prevent academic misconduct, e.g., reflective essays generated by ChatGPT, but submitted as students' original work (Stokel-Walker, 2022). To address these challenges, in this study, we set out to (1) empirically examine the quality of reflective responses generated by ChatGPT and (2) empirically investigate the use of state-of-the-art classification approaches to differentiate between the responses generated by the ChatGPT bot and the responses originally generated by students. 

Accordingly, we posed the following Research Questions (RQs): RQ1 – Can ChatGPT generate high-quality reflective writings? RQ2 – To what extent are reflective responses generated by ChatGPT distinguishable from reflective responses written by university students? To answer the RQs, we have extensively prompted ChatGPT to generate a diverse set of reflective writings. We also involved experienced teaching staff to evaluate the reflective depth presented in the writings. Lastly, we compared the differentiation performance (i.e., whether the ChatGPT-generated writings could be differentiated from student-written ones) among (i) experienced teaching staff; (ii) the state-of-the-art AI text detector released by OpenAI; and (iii) a BERT-based classifier (Devlin et al., 2018) fine-tuned on reflective writings generated by ChatGPT and written by students. 

The contribution of this paper is two-fold: 1) we illustrated the capability of the state-of-the-art large language models, specifically ChatGPT, in generating reflective writings and the quality of these ChatGPT-generated content compared to student-written works, and 2) we developed a BERT-based classifier for distinguishing between AI-generated and student-written reflective writings. These timely contributions could inform educational researchers and practitioners regarding ChatGPT and other large language models' potential impacts on reflective writing tasks, such as students might miss out on the opportunity to engage in cognitive reflection if they choose to use ChatGPT to generate reflective writing for the very purpose of completing an assessment.