Two Big Studies on AI in Education Just Dropped

Here is what they do and don’t say about teaching, learning, and technology.

Jan 22, 2025

Nigerian kids use AI to learn “nearly two years of typical learning in just six weeks.”

Many of you have alerted me to this study summary, often with a “😛 neener neener!” attached. What the summary says:

Some teachers in Nigeria got some AI training.
Those teachers and their students got a booklet about AI.
The students got a free license to Microsoft Copilot.
They all worked together for six weeks.

The result of those efforts? An effect size of .3, which is better than 80% of interventions in international education!

Look—if it were that easy to produce those kinds of learning gains, I’d be thrilled. But we know from decades of research that you don’t get that kind of effect size from one-off teacher professional development. It doesn’t happen. The booklet then? Of course not. So it has to be artificial intelligence creating those results and, I am very sorry, but please look at what these kids are doing with Copilot.

A picture of a kid's laptop in the study. The kid is looking at an LLM output that is derived from something like "What is an antonym?"

Students are asking Copilot to generate learning materials (e.g., examples of antonyms) that have existed for centuries. You can find examples of antonyms in print textbooks. You can find examples of antonyms in YouTube videos. You can type “examples of antonyms” into Google and get resources that are indistinguishable from what they’re generating with AI.

Can we be serious here? Can we take education just a little seriously? Can we put away childish things and ask ourselves: does this look better than 80% of interventions?

This is only a press release and I’ll look forward to the actual study. Let me tell you what I think is going on, though:

Students who were randomly assigned to participate in the program significantly outperformed their peers who were not in all areas, including English, which was the main goal of the program.

The press release is quite vague about the activity of the control group. Researcher Rose Wang asked the study author to describe those activities and the conversation stopped there. In this tweet, the study author tells someone else, “it would be good to have another group.” Was there another group getting tutored after school without assistance from AI? I’m wagering there wasn’t.

If it winds up being the case that the treatment group benefited from (1) teacher AI training (2) a booklet about AI (3) Microsoft Copilot licenses, and (4) extra tutoring after school, while the control group received none of those resources, then the researchers did not actually study the effect of AI on the treatment group. And if I had to give a random kid any one of those four treatments, I would choose “extra tutoring after school” ten times out of ten. After all, tutoring has a pooled effect size of—oh hey where have I seen this number before—roughly .3.

If, as I suspect, the researchers did not control this study well at all, then I regret to inform you that I am adding dozens more AI guys to my Pay No Mind list, every one of them thrilling to the possibilities of this study, to the music and the magic, to the what if and can you even imagine, all while teachers and students are really struggling and really need support in the here and now.

If the researchers did make reasonable efforts to control the study, at minimum giving an equal group of randomly selected students after-school English classes without AI, then I will concede that I have misunderstood something quite important about teaching, learning, or AI.

[Many thanks to Michael Pershan for trading notes on this study.]

UK teachers use ChatGPT 3.5 in lesson planning and save 30 minutes weekly.

Teachers were randomly sorted into ChatGPT-using and ChatGPT-forbidden groups. The study was pre-registered and tightly controlled. Teachers wrote down how much time they spent on lesson and resource preparation weekly and the ChatGPT-using group reported saving 30 minutes weekly. Neat!

Also neat: it doesn’t seem as though teachers got any training aside from access to an asynchronous resource. That’s cheap!

Generally, this study confirms my hypothesis that, in education, AI is neat. As I have said previously:

The most optimistic outcome I can conjure for AI in education is that it will manifest as a series of quality-of-life improvements for teachers and students, similar to the AI grammar checker that is right now ensuring I maintain subject-verb agreement, but not much more than that.

Less neat: the study found that the ChatGPT-using teachers used ChatGPT less over time.

As is usual for a product that is still trying to find its fit in a market, the qualitative feedback is more interesting than the quantitative. I really can’t recommend the section “Barriers & Facilitators” on page 71 enough. Nothing but teachers talking about why they pick these tools up and then put them down. Interesting stuff.

“Most teachers agreed that ChatGPT was easy to use (81%).” That’s a medium-sized bummer for people who are convinced that teachers need more PD to show them how AI works. Teachers get it pretty quickly, folks!
“The most substantial limitation to motivation was the existence of pre-prepared lesson resources.” The difference between an LLM-generated lesson and a set of lessons made by a bunch of really creative people, is still, in my conflicted opinion, incredibly vast.
“The two common themes were difficulties with reformatting ChatGPT outputs, and the inability of ChatGPT 3.5 to create diagrams. In the worst cases, reformatting or finding images negated or even outweighed the time saved by using ChatGPT.” This is the “last-mile delivery problem” I described in my ASU+GSV talk. It’s a big one.

Anyway

I feel a little smarter about all of this now. Hopefully you do too.

Michael Pershan

Jan 22

I'm waiting for the preprint, but from the blog post here's something else worth highlighting:

"After the six-week intervention between June and July 2024, students took a pen-and-paper test to assess their performance in three key areas: English language—the primary focus of the pilot—AI knowledge, and digital skills."

So that big shift in performance includes, I think, digital skills and AI knowledge. So this study says that -- compared to a group that spent no time with AI -- this group's AI knowledge and digital skills (and English) improved. So...yeah, that would not really be an interesting result at any level, and anyone who boosted this should be a bit...well, they should reflect on the relevance of this result.

Expand full comment

1 reply by Dan Meyer

Carrie Birmingham

Jan 23

I am a university-based teacher educator, busy teaching undergraduates how to plan different kinds of lessons, using resources and their minds. I have heard rumors about AI lesson planning systems, but I just don't have time to investigate them. A colleague at another university is teaching his students how to use Alayna instead of 'traditional' lesson planning that preservice teachers learn, and Cult of Pedagogy has advertised Brisk. These would be more sophisticated and expensive than chatgpt, and maybe they have substantive potential for teachers (more than neat)? I am doubtful, but my colleague is so enthusiastic I think I should pay some attention. What is going on with these systems?

Also a meme I saw: I want AI to do my dishes and clean my house so I can have more time to be creative. I don't want AI to do my creative work so I can have more time to do my housework.

1 reply

9 more comments...

Mathworlds

Discussion about this post