'Perceived Impact of Generative AI on Assessments: Comparing Educator and Student Perspectives in Australia, Cyprus, and the United States' by René F. Kizilcec, Elaine Huber, Elena C. Papanastasiou, Andrew Cram, Christos A. Makridis,Adele Smolansky, Sandris Zeivots and Corina Raduescu in (2024) Computers and Education: Artificial Intelligence comments
The growing use of generative AI tools built on large language models (LLMs) calls the sustainability of traditional assessment practices into question. Tools like OpenAI's ChatGPT can generate eloquent essays on any topic and in any language, write code in various programming languages, and ace most standardized tests, all within seconds. We conducted an international survey of educators and students in higher education to understand and compare their perspectives on the impact of Generative AI across various assessment scenarios, building on an established framework for examining the quality of online assessments along six dimensions. Across three universities, 680 students and 87 educators, who moderately use generative AI, consider essay and coding assessments to be most impacted. Educators strongly prefer assessments that are adapted to assume the use of AI and encourage critical thinking, while students' reactions are mixed, in part due to concerns about a loss of creativity. The findings show the importance of engaging educators and students in assessment reform efforts to focus on the process of learning over its outputs, alongside higher-order thinking and authentic applications.
In a remarkable convergence of research and real-world impact, the sudden emergence of ChatGPT has sent shockwaves through the global landscape of education. As students, educators, and university administrators grapple with the practical implications of generative AI, it becomes abundantly clear that we stand at the precipice of a new era. Since the release of GPT-3, a groundbreaking large language model (LLM) released by OpenAI, and its offspring, the user-friendly ChatGPT conversational interface, researchers are both excited and filled with trepidation over its boundless possibilities and transformative potential (Cotton et al., 2023; Farazouli et al., 2023; Nikolic et al., 2023). Generative AI, as defined by Weng (2023), refers to a technology that utilizes deep learning models to generate content that closely resembles human expression in response to complex and diverse prompts. These tools have the ability to produce conversational-style text that closely resembles human writing, as well as other visual and auditory media. They can be used to create systems that operate in ways that resemble human cognition and behavior (Siemens et al., 2022; Markel et al., 2023; Park et al., 2023). For example, ChatGPT and its derivatives is increasingly utilized for language translation, human-like conversation with chatbots, writing articles, stories, computer code, and other forms of written content ( Cotton et al., 2023).
Generative AI tools promise many benefits in education, such as increasing student engagement in learning tasks, providing timely feedback, aiding research and collaboration, and improving accessibility (Kasneci et al., 2023). For example, AI technology can provide immediate feedback via automated grading (Mate and Weidenhofer, 2022) and facilitate the provision of meaningful feedback in large cohorts (Bernius et al., 2022). At the same time, AI raises serious concerns about the validity of widely used assessment practices, especially concerns about academic integrity and bypassing important learning processes (Swiecki et al., 2022). Because standard assessment practices focus on evaluating the final products like essays to measure learning, researchers have highlighted the potential for plagiarism as a key challenge with using ChatGPT for assessment in higher education (Cotton et al., 2023). Students can potentially use generative AI tools like ChatGPT to cheat on online assessments by submitting essays that are not their own work. The problem might be more prevalent in online assessments where students tend to feel more distant from their instructors (Papanastasiou and Solomonidou, 2023).
Educators can face challenges distinguishing between students' own work and responses generated by AI tools, making it difficult to assess students' level of understanding and their ability to apply the material (Mao et al., 2024). Unless educators and academic institutions adapt to this new reality, generative AI can undermine academic integrity in online assessments and the purpose of higher education to educate students, which may reduce the signaling effects and inherent value in formal educational attainment (Cotton et al., 2023). To address this major problem, scholars have called for applying AI in classrooms in such a way that promotes self-regulated and more productive learning, rather than treating it as a replacement for human effort in the learning process (Hopfenbeck et al., 2023; Mao et al., 2024; Swiecki et al., 2022).
AI has been framed as a transformative resource that educators and students can leverage in teaching and learning. Weng (2023) suggests ways to employ generative AI tools such as raising awareness of these tools, using them in class, in assessments, and engaging in discussions with students about their promises and challenges. They argue that this is more productive than either banning them or giving them a central role in the curriculum. Integrating generative AI with assessments can also transform assessment practices and experiences, for example, by immersing students in simulated learning environments where they can safely and repeatedly practice skills (Markel et al., 2023). This paradigm shift may require the development of new assessment approaches and policies that achieve a balance between the advantages of AI and the imperative to maintain academic integrity (Chan and Chen, 2023).
Bearman et al. (2023) argued that educational assessment practice has not kept up with the digital transformation. Students and educators require better guidance on how to engage in meaningful interactions with AI systems for the purpose of assessment (Viberg et al., 2024). These interactions would directly assess students' learning process, critical thinking, and evaluative skills, not just their knowledge and comprehension. To this end, we expect to see revised guidelines and recommendations for educational assessment policies, incorporating input from stakeholders involved in assessment design to address the two major questions around AI integration in education: ‘what’ to assess (Sabzalieva and Valentini, 2023) and ‘how’ to assess it (Chan and Chen, 2023).
There are many ongoing conversations around what types of assessments are needed given the capabilities of generative AI tools. Bearman and Luckin (2020) emphasize that machines lack the ability to define quality or establish standards, making it crucial to develop assessment designs that prioritize the distinctly human capacity for defining quality standards. This raises questions on assessment standards and a move towards more authentic, adaptive, and continuous assessment (Gašević et al., 2023). Adapting current assessment practices in response to the ubiquitous availability of generative AI tools is timely but also effortful. As AI continues to play a pivotal role in society, assessments need to be adapted to ensure that they assess students authentically and ethically. Assessment approaches that foster human expertise and judgment are primed to gain greater significance through digital technologies (Dann, 2014; Nieminen et al., 2023; Bearman and Luckin, 2020).
This moment presents a rare opportunity for real innovation in current assessment practices, because most commonly used assessments were not conceived with access to powerful generative AI tools in mind. To meet the moment, we need to understand educators' and students' perspectives on the issue to achieve sustainable advances in assessment practices. We are especially interested in how much the perspectives of these two stakeholders—educators and students—are in alignment to provide a common ground. This may vary across contexts shaped by the local pace of technological adoption, institutional characteristics, cultural differences, and linguistic variation in technological efficacy (i.e., generative AI tools may work better in English than in other languages such as Greek). Within this context, we pose the following three research questions: (1) Which types of assessments do educators and students consider to be most impacted by generative AI? (2) How do educators and students think that students will be using generative AI in completing assessments? (3) And what are their preferences and attitudes toward adapting assessments to incorporate generative AI? Answering these questions by building on an established framework for examining assessment quality for university online assessments is essential since such knowledge is needed to guide efforts to reform future assessment practices.