One Way Teachers and AI Could Help Each Other Out
The edtech market currently has only two big ideas for interactions between teachers and students and generative AI. Here is a third.
The edtech market currently has only two big ideas for interactions between teachers and students and generative AI:
Teachers could use generative AI as a teaching assistant, independent of students, asking the AI agent to help them generate materials, plan lessons, assess students, etc.
Students could use generative AI as a tutor, independent of teachers, asking the AI agent to help them review old ideas or learn new ones.
In both cases, teachers and students and the generative AI operate independently of each other. Khan Academy’s Khanmigo lets teachers review the transcripts of chats between students and the AI agent, but that seems designed more for student safety, for retroactive review in case of a complaint, than with any kind of pedagogical plan.
Neither of those models emerge from a particularly strong theory of learning or teaching. Both start from the premise of “here is what makes generative AI useful” rather than “here is what learners need” or “here is what makes teachers useful.”
So I was excited to see some new research from Dora Demszky’s lab at Stanford wondering exactly that question. “How might the unique capabilities of teachers interact with the unique capabilities of generative AI?”
Kids get things wrong in tutoring sessions. Wang, Demszky, et al, wondered “Okay, how do different agents respond?” A novice human tutor. An expert human math teacher. A generative AI agent. Here are some examples.
The researchers wondered, “how could the AI agent and the tutor support one another’s work?” On their own, tutors may not feel confident coming up with a useful response. On its own, the AI agent is too blunt.
The researchers decided that tutors are uniquely capable of looking at an error, categorizing it, and coming up with a specific intervention strategy depending on the context. Perhaps all the student needs is encouragement. Or maybe a hint. Maybe a worked example. Maybe a minor correction.
When the AI agents were provided with those strategy suggestions, humans gave the AI interventions a 75% higher overall score on measures of effectiveness, caring, etc, than responses from the AI agent operating alone.
For example, operating on its own, GPT-4 responded to the error above with a complete solution.
Actually, if Mike had 4 cookies and ate 3, we need to subtract: 4 - 3 = 1 cookie left.
Not useful for the student, who no longer has to do any work to get the answer. But if humans first equip the model with the strategy “provide a solution strategy,” the response becomes:
Well, that was a good try, but let's try breaking the problem into two steps: first subtract the cookies Mike ate, then see how many are left.
This table shows that those kinds of responses—AI nourished with strategy information from a teacher—had some of the highest ratings of any generative AI model. (See all the highlighted cells in the “strategy + GPT-4” row.) Human interventions outperformed AI interventions in every category (bold cells) except that humans rated “strategy + GPT-4” as more human-sounding than humans, which ha ha well aren’t these are very exciting times?
Thanks for reading Mathworlds! Subscribe for free to get one new post about math, education, and technology every Wednesday.
What will any of this mean for students and teachers? I have no idea. Ratings for “strategy + GPT-4” aren’t all that far from ratings for humans. If you squint at the situation real hard, you can make yourself believe that AI is getting good enough that novice tutors could use it to approximate expert tutors … provided the novice is willing to do this kind of work, to read a student response and select a strategy from a menu of options, then approve the generated response.
The difference between what research subjects will do in a laboratory environment, prepped and paid to do something novel, and what people will do when subjected to the demands of their job and the culture of that work is simply vast. And exceptionally vast in schools where the demands of the job are heightened beyond what outsiders can even imagine and their categories change by the day and hour.
Will novice tutors be excited to partner with the AI agent in this way? Will they just start selecting the same default strategy every time? Will they just blow past the strategy selection screen to offer their own intervention, confident they know better? Who knows.
I’m only here to say that this kind of research, the kind with a literature review that’s as heavy with education citations as technology citations, the kind with a theory of technology and a theory of learning, let’s see more of it.