AI's Delivery Problem & How Tutor CoPilot Solves It
Here is why teachers aren't using generative AI.
In this article, I’ll try to explain the biggest reason why teacher usage of AI is low and I’ll share one AI edtech project that has solved that problem.
First, you need to believe me when I say AI usage is low among teachers. AI startups have received the most expensive and extended marketing campaign of any consumer product of any of our lifetimes and, almost two years later, teachers are still not using it in any numbers large enough to matter. Recently, RAND found that only 8% of all teachers regularly use AI in their teaching and 82% of math teachers have never used AI. Anecdotally, I hear from AI edtech startup founders that their retention rates—the number of teachers who use their tools regularly—are far below industry averages. Teachers use these tools once but not twice.
To the degree that tech and business leaders reckon with this data at all, they offer explanations that are, to my eyes, unconvincing. They say that teachers just need more training, that schools just need to offer more guidance. The most hopeless among them use the low usage numbers to indict schools and their goals, arguing we need to change the nature of schooling to fit the strengths of AI.
None of those answers are convincing. The answer, instead, is the delivery problem.
The Delivery Problem
If you want to mail a letter to a friend, the United States Postal Service solves your first- and last-mile delivery problem in a way that AI edtech does not.
The USPS comes straight to your door and picks up the letter. It goes all the way to your friend’s doorstep as well. It travels the first mile for you and the last mile for them. The USPS does not require you to get in your car and drive to the post office to mail the letter. It meets you way more than halfway, allowing you to do what you want to do: write and send a letter.
AI does not solve either the first- or last-mile delivery problem. Watch. A popular AI teacher copilot tool will create a lesson plan for me. It asks me for a prompt.
The first mile delivery problem here is that I have been forced to leave out immense amounts of important context. What have the students learned previously? Is “dilation” a word they know and we should lean on? Have they experienced similarity in the context of transformations? Are digital devices available? Are tools like Desmos a possible resource here? What kinds of lessons have students experienced recently? Do they need an introductory lesson? A concept development lesson? An application lesson? Do they need to spend more time on fluency? What curriculum are they using? Should we avoid decimals or fractions in the problems? What day of the week is it? What month of the year? How long has it been since the students had a break? Are they completely gassed or ready to put the pedal down?
My point here is that a teacher loads and processes petabytes of classroom context whenever they prepare a lesson. I can’t possibly type all of that information into a chatbot’s UI. This is the first-mile delivery problem.
And then you experience the last-mile delivery problem with the copilot’s output. What do you notice about the sections I have highlighted?
The teacher copilot tool has left the teacher with a significant last-mile delivery problem, one which may actually be impossible. The teacher has to develop six or seven resources to support the demands of the lesson plan. For example, the lesson plan tells me to show “a short TikTok video showing how similar shapes are used in real-life scenarios” and I’m like, “Okay but I came to you for that kind of thing!” This is as though USPS delivered your letter to the town next to your friend and told your friend, “You can come and pick it up any time!”
The delivery problem explains why a teacher might adopt a chatbot initially but then abandon it. The work it does on the middle miles is, indeed, quite impressive. But the work teachers have to do in the first and last mile deters frequent usage.
Solving the Delivery Problem
If you want to see where developers have solved the delivery problem, you can look in two places. One, you can look at programmer copilot tools, where the AI is embedded in the programming environment.
When you ask the copilot to write some code for you, you will likely have to check the code for hallucinations, but you will not have to work to bring the code the last mile into your development environment. It’s right there.
In education, a group of researchers led by Rose E. Wang did something similar recently with an AI tool they built and studied called Tutor CoPilot. I think Wang and her team at Stanford embedded a lot of thoughtful pedagogy into their tool, but perhaps their biggest innovation was embedding the co-pilot at the point of the tutor’s use.
They wanted to help tutors give better feedback to students in a virtual tutoring environment. What they didn’t do is make teachers copy the chat transcript, open a new tab, navigate to an AI tool, log in, access the feedback helper, paste in the transcript, get the result, copy it, go back to the tutoring environment, and paste the result into the chat. Tutor CoPilot went the first and last mile for the teacher, embedding that help into the virtual tutoring environment itself.
In 2024, teachers have told the world in survey after survey that they are unhappy in their jobs—especially that they are overworked. I’m confident that the people building these AI teacher copilot tools would like to alleviate that work. But to my eyes, and to the eyes of all the teachers who have tried these tools once but not twice, they are offering teachers homework rather than help, telling the teachers to walk several extra miles to get any value from them at all.
Featured Comment
Nick Corley, a veteran teacher, on what he wants to do less and more of in his job:
The task of collecting paperwork and money needs to be taken away from teachers now! There are plenty of tools to collect information and money. Things like lesson plans, worksheets, and creating other resources should all be part of a sound resource/curriculum. If we want education to improve, we need to give teachers more time to improve instruction by being creative, working together, honing their craft, and honestly...resting.
Odds & Ends
¶ I had a blast chatting last week with Fonz Mendoza on his podcast, My EdTech Life. His lens is strongly informed by classroom experience and his instincts for what will and won’t work for the 95% of students seemed sharp. You can consider my side of the conversation a 40-minute distillation of the last two years of my writing about edtech.
¶ In May 2024, the Bill & Melinda Gates Foundation asked people to send in their best ideas for using AI in math education. They received 200 responses and Digital Promise has just now synthesized them into a report. Here is the most interesting chart.
With this email, I am critiquing the most popular idea: teacher assistants. Previously, I have critiqued the second most popular idea—personalized content—which historically has done very little to improve achievement or interest in math. My team and I submitted an idea I am very excited about, one that I suspect was categorized as “Classroom Discussions.”
¶ Tracy Zager is an author and coach (and the editor of my forthcoming book!) and I appreciated her recent reflection on the relationship between teachers and the broken systems surrounding them:
I also know that doing that sort of work has become harder and harder to pull off because of the systems we are working in, and the ways they are breaking down. It’s awfully hard to have shared learning and teaching experiences if we can’t get coverage because there are no subs. If teachers come to the math lab completely overwhelmed because of other curriculum adoptions going on. If admin can’t participate because they’re in full triage mode, all day every day. If staffing shortages mean we are not meeting our IDEA requirements and we’re considering dropping to a four-day week.
¶ In an amazing victory for AI in education, Nevada’s use of AI led to a decline of 200,000 students deemed “at risk.” Wait [taps earpiece] my producers are telling me that AI didn’t help these students learn more or make their lives any less precarious. Rather, Nevada used a new AI system to reclassify its students and the AI system said, actually, 200,000 of the students you thought were “at risk” aren’t actually “at risk,” including a bunch of homeless kids.
At the Somerset Academy’s Stephanie Campus in Henderson, Nev., more than 250 children are low-income, federal statistics show. About a dozen are homeless. But the principal, David Fossett, said that not a single student was identified as at risk under the new system.
An absolute horror show, honestly. Neoliberal rot. Technosolutionism. It’s all in there.
¶ Alex Grodd hosted a thoughtful debate about AI in education between Benjamin Riley (anti) and Niels Hoven (pro). I thought Riley did a fine job summarizing personalized learning’s terrible track record and asking Hoven to account for research that it only works for a slim minority of kids, kids we might assume are among the most motivated. The debate broke down once Hoven admitted his goal is to support highly motivated students. I don’t think there was much to say after that point.
¶ NCSM released a video of my four-minute acceptance speech accepting their award for national math leadership earlier this month.
I love your first/last mile analogy, but as a tutor, I don't think this Tutor CoPilot solves those problems for me. I couldn't quite tell from your example, but it seems like this is a chat that is only visible to the tutor, and the tutor is supposed to keep one eye on the chatbot's hints while also interacting with the student. I am not eager to bring this distraction into my tutoring sessions. And while these hints might help novice tutors in the short run, I wonder if they would hamper the tutors' longterm development, training them to seek answers from a bot rather than build their own teaching skills.
I'm also quite skeptical of the evidence that this had any effect (from the article you linked): "The study didn’t probe students’ overall math skills or directly tie the tutoring results to standardized test scores, but Rose E. Wang, the project’s lead researcher, said higher pass rates on the post-tutoring “mini tests” correlate strongly with better results on end-of-year tests like state math assessments." If the chatbot is prompting tutors to ask questions that are highly similar to those in the program's internal assessments, that would skew the test results while having little impact on standardized test performance (or the underlying skills they measure). There are many available retired standardized assessments available – I'd be curious to see how the students perform before/after the Tutor CoPilot intervention.
In general, I think non-tutors/teachers tend to place way too much value on identifying a student's misconception and providing good explanations. Question selection, rhythm, making the student "drive", and many other aspects are much more important.
The quote that stands out to me from the Tutor CoPilot article:
“But it is much better than what would have otherwise been there,” Wang said, “which was nothing.”
I feel like this is one of the lines that AI edtech advocates default to when they're compelled to acknowledge the problems with their products: The alternative is nothing - do you want the students to have nothing?
To me, this speaks to a remarkable narrowness of perception. It treats both students and teachers as tabulae rasae who bring nothing to the table that they are not explicitly trained to bring. In doing so, it neglects that there is always *something* that the use of edtech displaces, and that this displacement comes with costs that the actual (as opposed to advertised) benefits of edtech frequently fail to justify. And to connect with the critical point that Tracy Zager is making, it represents educational systems and institutions as immutable givens, such that the implementation of edtech is the only possible intervention.
We need a different approach.