Authentic assessment in the age of generative AI

Image generated by DALL-E with the prompt ‘student working on assessment on their computer’

Authentic Assessment in The Age of Generative AI

Michael Sankey

We have all struggled with the advent of Generative AI in higher education, and the last 10 months has been quite the ride, particularly in terms of how quickly this has evolved. But we are starting to see some clean air as the dust starts to settle. In this piece I will first investigate where we’re going with this shift, what sits at the core of university concerns with artificial intelligence (AI), and then look at some different techniques that can be applied to our assessment regimes to help counter the threats and work with the affordances.

For universities, generally it lies in the notion of academic integrity and how we think that artificial intelligence is going to challenge us. Of course, we’ve been challenged in this space for many years with sites like Course Hero and Chegg who are reportedly covers for cheating sites. These entities have been using artificial intelligence to generate content for years using their own Generative AI tools, and they have been using these to help students create (or buy) their assignments. The only difference is that these large language model processes they have used are now available to everybody (or variations on these).

Purported cheating sites Course Hero and Chegg

Tangentially, I regularly get follow requests on my social media, from sites like Assessment Services UK, Aussie Assessments, etc., probably because the word ‘University’ appears in my profile. You will see below that they get to me through my Instagram feed, and Aussie Assignments even messaged me saying, ‘let me know if you need any type of assignment help we’re always ready to help you out and a very genuine price we are TurnItIn certified and guarantee 100% plagiarism free work’. 

Invitations to follow my Instagram https://www.instagram.com/michael_sankey/

Now imagine if I were a student, and I’m under the pump and I get something like this, how tempting would it be? But of course I digress, as the elephant in the room for us is Generative AI and that’s what we’re talking about here, and primarily the main protagonist of this, at least up to this point, has been the large language model called ChatGPT. Of course, that’s just the tip of AI iceberg and makes up a very small part of the AI ecology.

Image created with the assistance of the AI engine DALL·E

Below we see Generative AI called out in this World Economic Forum diagram that calls out educational skills and learning, but AI is also associated with leadership, It’s associated with the future of computing, It’s associated with entrepreneurship, It’s associated with all walks of life and so we know that there are literally hundreds of these sites out there. In fact, there are more than 1600 AI engines out there doing a whole range of things for people at the moment; images, video, coding, essays, etc.

From: https://intelligence.weforum.org/topics/a1Gb0000000pTDREA2?tab=publications

But, AI doesn’t always get it right. As you will see below, I did a bit of ego surfing, in that I was searching for myself at CDU. But you will notice not all those pictures are of me. Though the text associated with each one say they’re of me, when looking a bit closer the one down the bottom in the middle says that it’s Michael Sankey Professor and Director of Learning Futures and Lead education Architecture at Charles Darwin University, but in actual fact, it is showing an image of the Prime Minister of Australia, Anthony Albanese. Well AI hasn’t got it right, Google you’ve mucked up there 🙂

A Google search of my name, brings up an image of the Australian Prime Minister, but says its me.

So, we also know that things like ChatGPT are forcing us to rethink things like exams and online quizzes. For example, we have heard that Chat GPT can help you a pass the Bar Exam. Ah! you say, but we now have the TurnItIn AI Detector to help us. No, for as Sam Altman, creator of Open AI, says, creating tools that perfectly detect AI plagiarism is fundamentally impossible and so Altman warns schools and policy makers to avoid relying on plagiarism detection tools. We are seeing that already, TurnItIn functionality is not 100%, in fact it’s more like 80-60% correct,so we can’t take these results as gospel, rather just a guide.  

I was recently at the THETA conference in Brisbane and there was a futurist talking about AI and he was saying that AI is really great at the menial and mundane tasks, to help create initial information from what already exists, but it’s when we start using our right brain and making meaningful and human interactions possible that we start to see creativity and growth in knowledge. 

Anders Sörman-Nilsson (Global EMBA / LLB) speaking at the THETA Conference 2023

This requires us to be transparent about the decisions we’re making in relation to AI and what we, as a sector, are doing with these tools. For example, my university is saying, we think that AI is not an evil thing and that we need to learn to work with it, to teach our students how and when to use it and when not to use it. 

That means designing more authentic assessments, to balance essays and quizzes with other forms of assessment and to develop both AI focused and AI resistant assessments. We absolutely cannot run and hide, as these tools are with us and are now commonplace in our workplaces. In fact, as an educator, with the end goal of my teaching being to help student become employable, we should be teaching them how to use AI, and how to use these tools really well. That’s our responsibility to teachers. Let’s put ourselves, just for a minute, in the place of the employer. If I had 6 people working for me, and I knew that generative AI could make them more productive, would I be asking them to use it. You bet I would.  

In other words, let the machines do what machines do well and let’s train our students (prospective employee) to both master the machines and extend (value add) on what they can do so well. We do not have to reinvent what the machines can now do, why would we, instead we need to train our teaching staff to understand what is happening in the workplace and start to strategise how we can extend this, with our human intellect to extend what generative AI gives us.

In practice

Let’s now look at different ways to do this. It’s partly about teaching students how to read critically and how to evaluate and corroborate evidence. So asking them to collaborate around what they generate out of ChatGPT and to provide other evidence that would support what ChatGPT has given them. In other words, we look to distinguish between good and bad outputs. In saying that, we do not want to put ChatGPT or generative AI at the centre of Education. But because it is such a useful tool, we want it to help us to be more productive. Not to mention, it can also help us as educators generate helpful feedback on student academic writing.

One of my ‘things’ is that we can think of our students as productive beings, not just empty vessels waiting to be filled. And given the right inputs and encouragements, students are quite capable of producing original content very early on in their studies and be capable of solving real world problems, with their teachers. We already see this in some disciplines, like the creative industries, where students are expected to create new and original artifacts from day one. Another good example is the World Solar Challenge where students solve real world problems and extend what these technologies can do.

We know that Gen AI is a great launch pad for ideas to help us understand different perspectives on ideas. It is also helpful to test our ideas out, or to see if they have solid backing. It’s really great at formulas, and generating computer code and prompts for things like spreadsheets. We are seeing it used to generate meeting transcripts and some universities are starting to put in place Teams Premium that will provide summary notes of meetings, tutorials or lectures, on top of the normal transcript. For example, this can provide students more options if they miss a lecture or tutorial, and so why not use these tools to help them. If it’s going to help the students comprehend the concepts, then more power to them. But this challenges our thinking about what is acceptable in the process of learning.

I mentioned earlier about students being productive and that students can become creators. For example, Grammarly introduced some advanced AI functionality called Grammarly Premium. Traditionally we have used this to help students understand if grammar they’re using is correct, but now the AI functionality actually suggest other things that they could be introduced into the work. An amazing tool but how’s that going to work when we put that into TurnItIn? I’m not sure we have done the tests on that yet, but in terms of simplifying complex information it’s really helpful.

So what else is Generative AI useful for. Well, it’s particularly helpful for neurodivergent students and for those who have English as a second language. Or for students struggling to understand different concepts, it can aid their understanding. It can help students pivot from being a consumer of knowledge to being creators of new knowledge, as we ask them to extend what they find and teach them new promptings skills. These new skills help them tease out information and to then add their own insights and thoughts (extending knowledge).

Image created with DALL E, using the prompts, student working on computer at the end of the universe with spaceship.

Let’s take them to the edge of the known and then ask them to go further, what better time to do this while they are in the safe environment of their studies. Let them make mistakes and learn from this. Better now than when we release them into the workplace unprepared for this new order.

Josh Cavalier sees ChatGPT as kind of like being the meat in the sandwich in a recent linked in post. That is, we start with our human engineering prompts, ChatGPT then gives us the information we’re asking for and then we have to review and adjust and modify, do the grammar checks and plagiarism checks and things like that across this information, so we give it the input and then the possibilities become seemingly endless.

From a Josh Cavalier LinkedIn post

Authentic assessment

Now we finally get to taking about designing authentic assessments.  We’ve had a particular focus on this within the sector now for more than 10 years, trying to think how our assessments can become more genuine.  Fundamentally, we know that our students, when they participate in higher education, come with a genuine interest in the subjects they want to study. So the temptation, in some respects, ontologically is that there is a value proposition in this for them, that they complete their studies themselves. Now of course that is not always the case, but generally see that students don’t want to cheat, we’ve got the evidence that plenty of studies only cheat when they find themselves under extreme pressure (Rundle, et.al., 2023).  

But to some degree the onus is on us to design assessments that are fundamentally interesting and relevant to students’ future careers, and so that means future focused, and thus (hopefully) the value proposition to cheat is lessened as they see a reason to learn what they need to know.  To do this we need to be clarifying (be clear) the tasks we want them to do. We want to encourage them in their learning, which means we scaffold this information for them, so they’re not left floundering and having to use an AI tool to interoperate what we are requiring. 

But let’s not fool ourselves students will use generative AI, and are using it already, so we do need to develop some new types of assessments that are going to help us and them meet their learning outcomes.

A key concern I hear about generative AI is that students won’t understand what they’re submitting, that they will basically copy and paste what ChatGPT gives them (or another AI engine) and they’ll submit this and they don’t fully understand what they’ve submitted.  And that is true, if we make it possible from them to do so, they may well do this, and why not? 

This means we need to balance out this kind of written work with other forms of assessment. For example, like in-person oral presentations, or what we call Vivas, which can’t, at this point at least, be produced by an AI. Asking the students to tell you in their own words what they’re doing, usually face to face or over Zoom. It may be replacing or supplementing essays with other assessments, so you are not just relying on the essay alone. As we introduce other elements associated with the essay, such as visuals, or images, or infographics that might associate with the essay it helps students conceive visually (meta cognitive processing) what they’re doing in their written work, allowing them to talk to that visual and sense make. 

There are other good reasons to vary written work and introduce other forms of assessment.  By augmenting assessments with oral communication skills we are helping prepare our students for a life after study, as these skills are enormously valued in the professions. But for some reason higher education is undervalued oral skills for decades. Which is really very curious, as that’s where we came from; the Socratic model of education that is all about discussing and honing your thoughts through talking with and questioning each other about the things that matter. Things like this sit at the very heart of Education, a student having the discussion with the academic and fellow students, rather than doing a quiz, or an essay. 

So does that mean this is the end of writing skills? Absolutely not.  

But there’s more to it than just saying “oh but if students aren’t going to write essays, they’re not going to have the ability to rationalise and process information”. Poppy cock. Sure, we do have a recent history of ways to do things (essays, quizzes and exams) but that shouldn’t stop us being more circumspect about how and what we write, to develop more AI resistant assessments. If we are aiming for our students to demonstrate understanding, independent of written work, as opposed to reciting knowledge, then we need to look at other ways, rather than going back to pen and paper exams. 

Face-to-face learning, in the Uni I work in is the exception, not the rule. We have some 70% of our students who study at a distance, so that’s not an option. That might be different in your particular case but it is still about shifting away from things like exams to more authentic assessments. Why? Because it’s become so much easier to cheat with the use of technology, and there are lots of sites out there to help you.  

So are we asking students to demonstrate real world skills, you know, those one’s that might be used in the workplace?  Funnily enough, workplaces still require employers to write things, I mean, they might have to write a report, they’ll write emails and keep records. Interestingly though, my wife is a Psychologist and she’s been a psychologist for 20+ years. Over the course of her degree’s she wrote 50 or so essays, but since leaving study she’s never written one. She’s written reports and case notes and things like that, but never written an essay. So are we asking students to do real world activities in their studies? Not that I’m denigrating the essay, as they do hold an important part of what we do, but it’s how we design those essays to be more authentic. Particularly in terms of where we are thinking our students will end up. 

Defending one’s stance

To understand and communicate ideas and defend ones stance on a particular topic is important, to support and articulate what is in an essay (regardless of ChatGPT being used), a 5 to 10 minute ‘viva voce’ is all one needs. Let’s face it, that’s generally all you will get in a real-world meeting to present your case. So, we do this with our students as that is what will be expected of them when they have finished and are fortunate enough to get a job. 

For example, an assignment that asks for deep research on recent developments. We know that ChatGPT is not great on recent events. So as a teacher you may think, ‘what happened in the news yesterday or last week’ that could be relevant, and that ChatGPT will not know about, but you do, and so you can work with the recent applications of the knowledge.  Or you might see something in a recent journal article that ChatGPT cannot access or doesn’t yet know about. The trick is, what worked last semester may not work next semester as ChatGPT (or other tools) may have caught up. So check to see if it has. In other words, continually review your stuff so that you can set personalized and complex topics.

By way of extension to this:

  • we can ask our students to include their personal experience in the perspectives they are writing about, 
  • maybe ask them to analyse a class discussion for example, 
  • or provide a summary of the tutorial,
  • sum-up complex instruction that involves long texts that doesn’t fit the typical chat prompts, 
  • ask them to write about a very recent event (mentioned above) in the last week or so that can’t be found on ChatGPT.

But test it first, go in and test it yourself. It doesn’t take long but go in and test the way ChatGPT handles this. 

Essays

So, if we are going to persist with essays, at this point, then at least provide them with the reading you want them to use, preferably from for Uni’s databases, not from out in the wild (as ChatGPT will know about these), in other words journals that the student has to go into to access the content. It’s good practice to provide them with the readings so that they can’t use ChatGPT, which will go really wide and pull in a whole lot of stuff and not reference it properly.  But be aware that ChatGPT Plus can analyse documents that are uploaded to its database.

If you are concerned that they may have just copied and pasted their content from ChatGPT, you can ask them a submit a word version, not a PDF. The word version usually has a version history associated with it, so you can see when they created the document and if they have copy and paste something in, that becomes obvious. The history will show you the process that they’ve gone through. So we can do that now as we’ve got the technology to do that.

Or ask them to reflect on what they learned. You can do this by asking them to write a 300-500 hundred word summary of what they learned through that process of writing the essay. But just be aware that ChatGPT can write reflections, which is why these need to be associated with a task you have given them.

Multiple choice questions

So, we know an online quiz can be really problematic now. Why? Because they can have their computer open, and they can have the phone open and you can easily find the answers. This means we need to think about presenting our questions in a different way. For example:  

  • Consider using images, or figures, or charts, they then have to identify something from that image. For example, you might say which section of this figure below demonstrates such in such? They can’t put that in ChatGPT. 
  • Or you present questions with auxiliary visuals or as a hotspot question, and you can ask a student in the quiz to put  select the right place on the image  (hot spot), for example where a particular element may reside.  ChatGPT can’t do this. So ‘select the area of the image which shows such and such’.
  • You can present questions using a series of images or videos. These could be accompanied by some conditional logic or branching question. For example, most lecture capture systems or streaming media platforms, such as Panopto Echo360 and Kaltura, etc, have in context quizzes.  You can place the quizzes in the video, so that they have to answer the questions in the tool Itself about the content in the video.
  • You could consider give an answer and ask them to choose the right question.
  • If a traditional quiz, ask them to do this, but then include in this a 5 minute Viva, where you ask 2 or three random questions that were in the quiz, just to feel confident that they do know the answers.

It might be that you present the questions that require a student to apply a concept or principle to an up-to-date scenario. For example, you might say, “The Voice Referendum legislation went through Parliament a few weeks back, but there were those who spoke against it, what are the implications of this and what are the implications of a dissenting voice?” So using something really recent like this, ChatGPT will know anything about this, and so you could do this in a quiz quite happily.

If you have to use a quiz, at least check for the answers yourself on ChatGPT.

Non-text-based

Consider options that are not text-based for your assessments. In other words, an alternative to essays and quizzes, and there is plenty of information out on the web about this so for example: 

  • Using impromptu video presentations for assessment 
  • Creating a podcast using other digital forms such as animations, mind maps, or infographics 
  • Padlet is a great tool, or VoiceThread, are tools that make your student participate rather than just regurgitate information.
  • Aural reflections rather than written ones, asking them to self-assess and explain how you came to this conclusion .

The power of the peer

Within education we love the peer.  I will submit a journal article and it will go out to a couple of my colleagues and they will evaluate and assess it and maybe suggest a few things to improve it. 

Did you know, we can do the same thing with our students? Many of us have systems in place to allow our students to do just this (peer assessment). Some LMS’s have this in a rudimentary way and then there are specialist tools like FeedbackFruits, Spark Plus or even PebblePad

In simple terms, we ask our students to submit their work and this work is assessed by two (or more) other students. In this way each student is exposed to the work of two other students, so not only are they learning from doing the work themselves, but they are also learning how two other students have dealt with the same topic and does it match their own understanding. 

For example, if we want to use Generative AI, just as we might in the workforce, engagement with one’s peers is a common, where ideas are shared, and feedback is received and incorporated into a final piece of work. 

Although AI itself does not facilitate peer collaboration, the outputs from it can. This then can become a negotiated activity, where students are asked to validate the outputs that Generative AI might provide, then alter, and/or further test the concepts. For example: 

  • In Physiotherapy, students may be asked to generate a treatment plan for somebody who has damaged their sacroiliac. They are asked to use AI to generate a response that might determine the treatment to be used. Each student does this task independently. They then post their responses into a peer-assessment tool and each student is asked to peer-review the work of two other student. They are asked to validate if that treatment plan is appropriate and provides feedback to the other two students. In this way the student is not only exposed to their own work, but the work of two other students, thereby extending the opportunity to engage with the concepts. By way of extension (only), each student then submits the final piece of work that has taken into account what the other students have commented on.
  • In journalism, students may be asked to take a current event (within the last 3 months) but place it in an historical context. It is unlikely that Generative AI will have a current record, but it will be able to generate the historical context. Depending on the technologies available at an institution, this activity can be done in a shared document in OneDrive, or on a wiki or any shared space that students can provide feedback in the document itself. Students each do the activity, but then are asked to improve two other pieces, as an editor might do in a Newsroom. The final piece can acknowledge the input from the other two students. Sourcing or generating images to suite this activity could be an appropriate extension to this task.

Interactive assessments

Interactive assessments engage students in group activities.  Now this can be done face-to-face and in an online environment, maybe in something like Microsoft Teams or Slack. This may involve asking them to research and analyse some data and then to co-construct scenarios.  

From the educator’s perspective, we might consider emphasising the process of learning rather than the outcome, so thereby concentrating on that learning process, not on the fact that they’ve answered the question. So, as long as they can articulate what they’ve learned along the way and how they’ve processed that information, means that we don’t consider each piece of assessment as being that the end of the journey, it’s actually the evolution of thought. This is particularly effective in programmatic assessment, or assessment that is followed across a program of study, rather than broken into isolated units, unrelated to each other. This opens the door to more interaction around a sustained and interactive approach to assessment.

As part of this, how would it look if we got our students to submit drafts of what they are working on, or something along the line? To that I hear the voices “I haven’t got time in my workload for that”. But at the end of the day aren’t we teachers, and drafts allow students to be corrected and put back on track earlier. Making sure they really grasp what is being asked of them. It would also identify any that are struggling and requiring extra attention. But to do this we must consider redesigning our assessments so that we can fit this workload in. Maybe that means there is less assessments. But things that demonstrate thought creation over time and assessing critical thinking, while avoiding classic formulas could well be of more value than a quiz. This means, that assessments that are formulaic, or those ones that have been used repeatedly, are really easy to cheat on. Instead, maybe we consider assessments that are open-ended and might continue over the course of the semester not just the one-off big piece of work. 

Assessment that are iterative, stimulate the formulation of original ideas, and potentially the development of research questions.  Then getting students to define the research questions, or elements of a question then creates ways to assess students deep critical thinking skills.  Or, ask them to:

  • Critique a paper or even some text that’s been generated by ChatGPT, e.g. get them to generate the text, then critique it to see how accurate it is. This requires them to go-off and find some references that might support their find, or otherwise. 
  • Building logical arguments that demonstrate reasoning skills. maybe that’s by asking them to create a concept map, flow chart or diagram.  Or this could be a set of virtual sticky notes in Padlet or Miro, or whatever, but the trick is to help them unpack the topic into its component parts.

Importantly, we take advantage of some of the shortcomings of generative AI to help our students understand that it does have shortcomings. Not so they don’t use it, but so they use it with open eyes. We do this by:

  • Focusing on current and factual events and locating accurate and genuine data sources and them ask them to test these outputs and to provide the evidence to back on what ChatGPT found.  
  • Asking students to cite and reference the work they find on ChatGPT and  insert in-text citation in the appropriate places.

This is not new, but it’s turning things around a bit. Let’s face it, who hasn’t written something and then tried to find the citation to support what we have said. Be honest 😉  

But is this all to hard? Maybe we just keep going with the easy things, like essays and quizzes.  Please don’t, there are so many more options around authentic assessment that we can ask students to do, again, like critiquing a piece of writing generated by ChatGPT 

Beyond assessing cognitive skills, we can incorporate assessments that require other skills like collaboration, communication, leadership, role play, etc. that require they to process information in different way.

Maybe we can think about how we could re-weight our assessments, so that those that are assessing lower order thinking skills are also weighted lower. Uncritically, we tend to say “okay, well that bit of assessment is worth 20% and that’s worth 40%. But, is this more based on tradition? Rather than against the level of critical thinking skills that are being required. This means we could now review those weightings. 

We could also think about using:

  • Some compare and contrast sharing activities 
  • An analysis of a virtual or augmented reality simulation the ChatGPT does not know about
  • Us an evidence-based portfolios such as in an institutional ePortfolio 
  • If using essays and reports requests direct quotes and multiple citations.
  • Oral assessments Viva Voce

I just want to finish off by reiterating the words of Chief Commissioner of TEQSA Professor Peter Coldrake, who said, “The power of generative AI tools requires a deep rethink of approaches to teaching and learning and assessment practices and how higher education institutions are ensuring that students have attained the skills and knowledges they need to graduate with their Awards” 

So it’s our responsibility to make sure they have the skills and knowledge they need to graduate and so this powerful tool called AI or ChatGPT and those other tools like this, that we need to be so conscious of our what we’re doing in our assessment practices. 

In conclusion, these are my top ten priorities for assessment:

  1. Reduce emphasis on final high-stakes exams
  2. Reduce propensity for wide-spread quizzes for important assessments
  3. Look for opportunities for course-wide assessments (alignment across units)
  4. Weight assessment items aligned with level of learning (low for low-stakes)
  5. Increase emphasis on formative assessment feedback ‘for learning’ (feedback literacy)
  6. Designing active, collaborative, authentic assessment
  7. Increase the use of WIL, group and peer assessment
  8. Assessment for inclusion
  9. Increased use of multimodal assessments
  10. Reduce essays and long form text that can be easily cheated

References and further reading

These are some the works I have drawn on. I haven’t explicitly referenced them against each concept because it’s actually my reworking of a lot of those things. But these are some of the documents that have helped form my thinking.

Please cite: Sankey, M. (2023). Assessment in the age of generative AI. Michael Sankey WordPress Blog. Available from: https://michaelsankey.com/2023/09/21/authentic-assessment-in-the-age-of-generative-ai/

One thought on “Authentic assessment in the age of generative AI

  1. Finally! A compilation of all those ideas spinning around regarding authentic assessment. Very helpful. These are exciting times and exciting / provocative ideas. Thanks Michael.

Leave a comment