Designing Programming Courses in the AI Era | Alternative Assessments for Genuine Learning

Cover image for programming course design article
Designing Programming Courses in the AI Era

With generative AI becoming ubiquitous, programming educators face a critical challenge: designing assessments that measure genuine learning without being circumvented by AI tools. This article examines multiple methods—from attendance and code walkthroughs to timed coding tasks and oral exams—with explicit tradeoffs. We introduce an "AI-resistance" framework to evaluate each assessment's robustness against AI assistance, discuss scalability constraints, and explore which methods work at different course scales.

Opening

Professors and teaching staff providing programming courses are facing a new challenge in the AI era: how to design homework, projects, exams, and other assessments that can effectively measure students' understanding and skills without being easily circumvented by AI tools. This article explores various methods and tradeoffs for designing assessments in programming classes that maintain academic integrity while still encouraging genuine learning.

Purpose

  • Preserve students' core programming skill development.
  • Measure genuine understanding rather than AI-polished outputs.
  • Incentivize time spent on learning-relevant work.
  • Maintain fairness and interpretability in grading.
  • Adapt to the AI era without giving up core learning objectives.
  • Keep the course operationally sustainable.

Constraints and Tradeoffs Teaching Staff Are Willing to Accept

Different delivery formats and assessment methods brings extra work for the staff and introduce scaling challenges. We need to be clear about what tradeoffs we are willing to accept in order to achieve the goals above.

  • Increased staff effort is acceptable if assessment quality improves meaningfully.
  • No format is fully AI-proof; the goal is robustness, not perfection.
  • Some convenience can be traded for more authentic evidence of understanding.
  • Different assessments may have different rules about AI use.
  • More complex grading is acceptable if rubrics remain clear and fair.
  • Attendance and participation can be rewarded when aligned with course goals.
  • The course should emphasize incentive design, not pure policing.
  • Any solution must still scale to the course's staffing and size.

Methods

Attendance

  • Attendance Sheet: A physical sheet passed around with a quick question (not a quiz) or a prompt to jot down one key takeaway from the lecture. This is proven to be working in 6.2000. The handwritten element makes it hard to fake remotely.
  • QR Code Signin: A QR code displayed in lecture for signing in, combined with a rotating code or quick question. This is proven to be working in 6.2500. The code changes every lecture, so students must be physically present.
  • In-class Polling: Tools like iClicker or Google Forms with a live question at a random point during lecture. This doubles as an engagement check and attendance record. The unpredictable timing discourages students from leaving early.

Participation

In-class discussion where students answer questions or share thoughts verbally. This is inherently hard to expedite with AI tools, but also hard to record and grade at scale. It requires TAs running around and noting names, and lecture pace is usually too fast for extended discussion.

One practical approach is structured small-group discussion: pause lecture for 2–3 minutes, have students discuss a question in pairs or small groups, then cold-call one group to share. TAs can circulate and note participation. This is lighter-weight than full-class discussion and scales better in large lectures. Another option is post-lecture reflection forms—a one-sentence takeaway submitted within 10 minutes of class ending—though this is more easily gamed with AI.

Realistically, participation remains the hardest component to assess fairly in large CS classes, and many instructors may choose to fold it into attendance or drop it entirely.

Homework

  • Traditional: Written answers and coding submissions, using Google Colab + Overleaf Latex. However, students can easily use AI tools for both coding and writing, making this the most vulnerable format. The traditional homework workflow is the main target we are trying to mitigate against.
  • Oral Homework: Students record audio or video explaining their solution process. Chalk talk is a stricter variant: students explain on a whiteboard or paper in front of a camera, unedited, showing their real-time reasoning. This is much harder to fake because the student must demonstrate fluency with the material live.
  • Video Homework: Students prepare a short video explaining a concept or subtopic. This tests understanding and communication, but can be expedited by AI-generated scripts or even AI-generated audio/video.
  • Online Timed Coding Task: Students write code on a platform like HackerRank or LeetCode under a strict time limit. This directly tests coding ability under pressure. A known counter-measure is two students teaming up—one opens the test early and screenshots it for the other—but this can be mitigated by randomizing question order and answer options, or using a proctoring tool.
  • Code Review / Walkthrough: Students submit code and then must explain it in a short 1-on-1 or small-group session with a TA. The TA asks follow-up questions ("Why did you use this data structure here?" or "What happens if the input is empty?"). This is operationally expensive but extremely effective at verifying genuine understanding, and nearly impossible to game with AI. Recent research has shown that shifting the focus of grading from code correctness to in-person demonstration of understanding can improve learning outcomes and better support computational learning in the age of generative AI (Wilson and Nishimoto 2024).
  • Iterative Submission with Diffs: Students submit multiple drafts over time, and grading considers the progression of their work. This makes it visible when a student jumps from nothing to a polished solution in one step, which is a strong signal of AI-generated work. However, this requires more grading effort and may be frustrating for students who prefer to work iteratively offline. Might need a specialized online coding interface to make this feasible at scale.

Quiz and Exams

  • Traditional Written Exams: In-person, closed-book, handwritten. Still the gold standard for verifying individual understanding, though limited in testing practical coding skills. However, this part can sometimes depends more on luck. Since exam cannot cover all the course material, a student could be lucky enough to reviewed the correct material (25%) that happened to show up on the test paper (20%), instead of the whole material (100%).
  • Open-book / Open-internet Exams: Students can use any resources but must finish within a tight time limit. The time pressure ensures that students who understand the material have a significant advantage over those trying to look everything up or prompt an AI in real time.
  • Oral Exams: Students explain solutions live to an instructor or TA, unedited. This is the strongest guard against AI assistance but the most expensive to administer. Works well for smaller classes or as a random audit on a subset of students.
  • Take-home Exams: Longer time window (e.g., 24 hours), open-resource. Essentially indistinguishable from homework in terms of AI vulnerability, so these should be paired with a follow-up oral component or code walkthrough to verify understanding.
  • Conceptual Short-answer Questions: Instead of asking students to produce code, ask them to trace through given code, predict output, identify bugs, or explain why a particular approach fails. These are harder for AI to help with under time pressure because the questions are context-heavy and visual.

Projects

  • Report: Traditional project with written report and code. AI tools are generally allowed here since the focus is on idea, execution, and synthesis.
  • Poster Presentation: A poster session at the end of the semester where students present and defend their work to peers and instructors. The live Q&A component makes it resistant to AI shortcuts.
  • Oral Presentation: Students present their project in front of the class and answer questions. This tests both understanding and communication.
  • Video Presentation: Students prepare a polished video presenting their project. Less interactive than live presentations but useful for large classes.

The project component doesn't need to change much in the AI era, since projects are already AI-tool-allowed, and the value lies in the idea, execution, and presentation rather than rote coding.

The main focus for redesign is homework, quizzes, and exams—where we want to measure genuine understanding while making it hard for AI tools to substitute for real learning.

For attendance and participation, it really depends on the professor's preference. CS classes usually don't emphasize these compared to EE or other departments. Personally, I think students attending lectures and participating in class discussions helps them learn better and should be rewarded, even if the weight is small.

Summary of Methods and Tradeoffs

A short summary before we move on to the next section.

Let's define a scale called "AI-resistance" to evaluate how well an assessment method—or a course's overall design—can resist students using AI tools to complete it without genuine understanding.

AI-Resistance Level Example Formats Description Tradeoff
Low Traditional written homework, take-home exams, final project with report only Easily expedited by AI tools. Students can achieve high scores without truly understanding the material. Output-based grading cannot distinguish AI-generated work from genuine effort. Lowest cost to administer. Familiar to students and staff. Scales well.
Medium Oral homework recordings, online timed coding tasks, open-book exams with time limits, video homework, iterative submissions with diffs More difficult to rely solely on AI, but workarounds exist. Students could use AI to prepare scripts for oral recordings, or team up to share screenshots on timed tasks. Time-pressured exams favor prepared students but don't fully block AI use. Moderate operational cost. Some formats (e.g., timed coding) require platform setup; others (e.g., video) require review time.
High Chalk talk (live, unedited whiteboard explanation), oral exams, code review/walkthroughs with TAs, in-class poster Q&A Requires students to demonstrate understanding live and in real-time. Follow-up questions make it nearly impossible to fake knowledge. These are the closest to a ground-truth measure of a student's understanding. Most expensive to administer and grade. Requires significant staff hours. May cause anxiety for students uncomfortable with live assessment.

How Hard the Course Should Be

There are two dimensions to consider: the difficulty of the material itself, and the difficulty of the assessments. The material should be challenging enough to push students to learn and grow, but not so difficult that it becomes discouraging or inaccessible. The assessments should be designed to fairly evaluate students' understanding and skills without being unnecessarily punitive.

The grading policy in MIT states as follows:

The grade for each student shall be determined independently of other students in the class, and shall be related to the student's mastery of the material based on the grade descriptions below. Grades may not be awarded according to a predetermined distribution of letter grades. For example, grades in a subject may not be allocated according to set proportions of A, B, C, D, etc.

Closing

Reference

Wilson, Sara Ellen, and Matthew Nishimoto. 2024. "Assessing Learning of Computer Programming Skills in the Age of Generative Artificial Intelligence." Journal of Biomechanical Engineering 146 (5): 051003. https://doi.org/10.1115/1.4064364.