One Way Teachers and AI Could Help Each Other Out
The edtech market currently has only two big ideas for interactions between teachers and students and generative AI. Here is a third.
The edtech market currently has only two big ideas for interactions between teachers and students and generative AI:
Teachers could use generative AI as a teaching assistant, independent of students, asking the AI agent to help them generate materials, plan lessons, assess students, etc.
Students could use generative AI as a tutor, independent of teachers, asking the AI agent to help them review old ideas or learn new ones.
In both cases, teachers and students and the generative AI operate independently of each other. Khan Academy’s Khanmigo lets teachers review the transcripts of chats between students and the AI agent, but that seems designed more for student safety, for retroactive review in case of a complaint, than with any kind of pedagogical plan.
Neither of those models emerge from a particularly strong theory of learning or teaching. Both start from the premise of “here is what makes generative AI useful” rather than “here is what learners need” or “here is what makes teachers useful.”
So I was excited to see some new research from Dora Demszky’s lab at Stanford wondering exactly that question. “How might the unique capabilities of teachers interact with the unique capabilities of generative AI?”
Kids get things wrong in tutoring sessions. Wang, Demszky, et al, wondered “Okay, how do different agents respond?” A novice human tutor. An expert human math teacher. A generative AI agent. Here are some examples.
The researchers wondered, “how could the AI agent and the tutor support one another’s work?” On their own, tutors may not feel confident coming up with a useful response. On its own, the AI agent is too blunt.
The researchers decided that tutors are uniquely capable of looking at an error, categorizing it, and coming up with a specific intervention strategy depending on the context. Perhaps all the student needs is encouragement. Or maybe a hint. Maybe a worked example. Maybe a minor correction.
When the AI agents were provided with those strategy suggestions, humans gave the AI interventions a 75% higher overall score on measures of effectiveness, caring, etc, than responses from the AI agent operating alone.
For example, operating on its own, GPT-4 responded to the error above with a complete solution.
Actually, if Mike had 4 cookies and ate 3, we need to subtract: 4 - 3 = 1 cookie left.
Not useful for the student, who no longer has to do any work to get the answer. But if humans first equip the model with the strategy “provide a solution strategy,” the response becomes:
Well, that was a good try, but let's try breaking the problem into two steps: first subtract the cookies Mike ate, then see how many are left.
This table shows that those kinds of responses—AI nourished with strategy information from a teacher—had some of the highest ratings of any generative AI model. (See all the highlighted cells in the “strategy + GPT-4” row.) Human interventions outperformed AI interventions in every category (bold cells) except that humans rated “strategy + GPT-4” as more human-sounding than humans, which ha ha well aren’t these are very exciting times?
More please.
What will any of this mean for students and teachers? I have no idea. Ratings for “strategy + GPT-4” aren’t all that far from ratings for humans. If you squint at the situation real hard, you can make yourself believe that AI is getting good enough that novice tutors could use it to approximate expert tutors … provided the novice is willing to do this kind of work, to read a student response and select a strategy from a menu of options, then approve the generated response.
The difference between what research subjects will do in a laboratory environment, prepped and paid to do something novel, and what people will do when subjected to the demands of their job and the culture of that work is simply vast. And exceptionally vast in schools where the demands of the job are heightened beyond what outsiders can even imagine and their categories change by the day and hour.
Will novice tutors be excited to partner with the AI agent in this way? Will they just start selecting the same default strategy every time? Will they just blow past the strategy selection screen to offer their own intervention, confident they know better? Who knows.
I’m only here to say that this kind of research, the kind with a literature review that’s as heavy with education citations as technology citations, the kind with a theory of technology and a theory of learning, let’s see more of it.
Dan, I could not agree with this more. AI can never ever be a substitute for teachers, great teaching and, perhaps most importantly, the relationships of trust, belonging and identity that adults transmit to children. The question for me is can AI help humans optomize these relationships by carrying some of the load of sharing content, addressing the hidden complexity created by large class size and the work it creates, and the need to personalize content. The ability to diagnose error in real time, IMHO, is a power that can enable all teachers to play an even greater role in the lives of their children. That is why as we think about introducing AI into classrooms, it is critical to think through the use cases where it will work and where it WON'T work. It is critical to acknowledge that learning is a social, collective process as much as it also requires deliberative practice to master skills. These understandings can only be coconstructed by developers, teacher, students and their families. As you suggest, lab setting are not, in this case, a place for learning. Keep being thoughtful and provoking! I love reading your work. Bob Hughes
It's refreshing reading an acknowledgement of what it can be like teaching sometimes, "in schools where the demands of the job are heightened beyond what outsiders can even imagine and their categories change by the day and hour."
Last Friday, for example, a student found a bug on his sweatshirt in the middle of a quiz and everyone in the room immediately had something to say or do -- including another student falling out of his chair with excitement -- while the student tried (unsuccessfully) to crush the poor insect against the wall with his workboots. Not all situations provide enough time to consult AI haha. Fortunately, I remained calm, gave the class five seconds to quiet down, reminded them of class expectations, and whisked the insect outside in a Tupperware (it was fine) while my co-teacher watched the class quietly continue their quiz.
So many aspects of teaching relate to setting expectations and working with a range of behaviors in addition to teaching students mathematics!
What you write about is pretty wild and I could see it being useful for the calmer moments, even for novice teachers not just novice tutors. One thing I'm curious about: how long before the strategy is built in to the GPT model, or will we need to continuously think of good prompts to help adjust responses? I'm not super familiar with GPT despite having played around with it: are there different libraries or structures expert teachers have made already that you can upload, or even copy and paste in, to give GPT a good starting point on teaching strategy?