ChatGPT-4o Will Be Great for Certain Math, Certain Thinking, and Certain Kids
Here are the important questions about OpenAI's product announcement that few people are asking right now.
On Monday, OpenAI announced GPT-4o, an upgrade to their ChatGPT technology. Claire Zau has a summary of the announcement that I found helpful. I’ll focus here on two videos relevant to math education.
In one video, several OpenAI employees sit around a table solving 3x + 1 = 4. They hold a cameraphone above a piece of paper and ChatGPT parses their work using OpenAI’s new multimodal technology and guides them to a solution.
In another video, Sal Khan’s son Imran Khan answers a question on Khan Academy about finding the sine of an angle in a right triangle. He’s streaming his screen to ChatGPT which, again, uses multimodal input to guide Khan to a solution.
ChatGPT asks Khan at one point, “Which side do you think is the hypotenuse?” and Khan indicates his answer by drawing on the screen. ChatGPT accurately interprets Khan’s drawings and responds with corrective feedback, a conversational pace, and a positive tone.
This is pretty neat. 👍
Look, folks. I’m not made of stone. This is impressive technology. I am strongly considering upgrading my previous assessment that “generative AI is neat” to “generative AI is pretty neat.”
Across social media, however, many tech and business leaders are claiming that these product changes are more than “pretty neat,” that they instead herald a revolution in learning. The revolution that they promised a year ago with GPT-3.5, the revolution that many of them promised throughout the 2010s with YouTube videos and autograded quizzes, those were not the real revolutions. Those promises should not be scrutinized too closely now, please. There has never been a Pundit Accountability Tribunal and now is not the time to create one. We should look forward, not backward. This is the actual revolution—for real this time.
Two things are both true:
These people may be right.
If they are, it will be an accident.
Tech and business leaders are deeply unserious right now. Rather they are deeply serious about new technology and shareholder returns but they are deeply unserious about the needs of learners.
I have digested hundreds of social media posts recapping these demos over the last two days. Only a small handful have made any attempt to answer any of the questions that any serious person should wonder here:
What kind of math is this good for?
What kind of thinking is this good for?
What kind of learner is this good for?
I’ll test these features more as they’re released more widely, but for now I can say with absolute certainty that GPT-4o will only help certain learners this certain thoughts about certain kinds of math. This is true of every new tool or medium. It is certainly true of the tools I have helped build.
We need to ask ourselves about every edtech product, “What is the warranty here?" It is never unlimited.
Certain Math
What do you notice about the math in both demos?
These are math problems that a student can solve using algorithms, math problems that result in a single numerical calculation that can be evaluated as correct or incorrect. “Calculation” is an important mathematical skill and if we could significantly improve a student’s ability to calculate (and all other considerations were equal) we would celebrate.
But calculation is to mathematics what chopping onions is to hosting a dinner party. It is important, but it is not central. It is not a prerequisite for other kinds of work. If chopping onions were a person’s primary association with hosting a dinner party, they would host many fewer dinner parties.
In addition to calculating with math, students need to argue, estimate, sketch, notice, wonder, construct, speculate, describe, evaluate, play, and so on.
Students need to calculate solutions to equations as in the OpenAI video, but also to create equations from a relationship. They need to use the sine ratio to calculate unknown side lengths as in the Khan video, but they also need to identify right triangles in the world around them.
It is possible that GPT-4o can help students learn that other mathematics, but it seems equally likely to me that calculation is just uniquely friendly terrain for GPT-4o.
Certain Thinking
Khan asks for help finding the sine of an angle in the triangle and GPT-4o asks:
Can you first identify which sides of the triangle are the opposite, adjacent, and hypotenuse relative to angle alpha?
This question focuses the student on small ideas. It breaks a big idea—the equivalence of these ratios across similar triangles—into smaller ideas, ideas which a student can memorize and master without understanding the big idea even a little.
This looks like success to many. To me it looks like someone has successfully diced an onion without understanding why we’re hosting the dinner party, what we hope our guests experience, or how we’re going to structure the evening.
We can focus students on larger ideas by asking other questions.
What is the question asking you to do?
What do you know about that?
What is special about this triangle?
What do you know about sine?
Perhaps this is only a matter of prompt engineering. Perhaps GPT-4o can be trained to focus students on those larger ideas. But I suspect it is equally possible that GPT-4o has been trained on the corpus of small ideas and step-by-step guides that pollute the mathematical internet. I suspect GPT-4o will struggle to engage a student in a conversation about those big ideas also because those ideas are hard to evaluate as either true or false. They are almost always both—true and false simultaneously.
Certain Learners
I wish it went without saying that the learners in these videos are not typical of US K-12 students. They do not represent the median student in age, education, socioeconomic status, or the desire to perform, especially here for cameras broadcasting to a worldwide audience.
What is typical? In a large 2018 study of thousands of students, Khan Academy reported that only 11% of participating students used its software for the recommended dosage of 30 minutes per week. This was in a study population of teachers who committed to helping their students meet that usage threshold.
We are moving at a very brisk pace away from the promises so many tech and business leaders made about their previous products, products that huge majorities of students placed on the curb next to the cans and bottles, on to new promises about their new products, all without asking, “Why didn’t the last product take for students?” All these startups crowding into the edtech AI space right now should wonder, “Which students are at all interested in talking to their computer about math?”
I think it’s an open question, but the assumed answer among tech and business leaders seems to be, “All of them! Why wouldn’t they be?”
I will tell you why.
First, students are much more interested in talking to people in their class, or texting people in other classes, than they are in talking to their computer about math.
Second, many students do not want to ask the question, “Hey GPT, check my work here. How did I do?” because ignorance is often bliss. Ignorance is often preferable to learning that you have more to learn.
I am trying to remain open to the possibility that new technology might one day lead to a drastic reduction in the number of necessary teachers in the world, or at least a drastic change in the kind of work they do. But this characteristic of students is why we have teachers. To encourage, cajole, and compel. To make new learning seem appealing and accessible. To convince students that math is for you and you are for math.
Many people describe generative AI as an infinitely patient tutor. They don’t understand that this makes generative AI an ineffective tutor.
It’s true that GPT-4o will patiently wait for you to ask for help. But effective teachers do not wait to be asked. Effective teachers know that many students will never ask for help, preferring to pass the time from bell to bell without bothering or being bothered by their teacher. Effective teachers are a bother! In these videos, you see GPT-4o waiting reactively for the learner’s invitation whereas a skilled teacher will proactively create their own invitation.
This is impressive technology, certainly, but to make even the junior varsity tutoring team (to say nothing about varsity tutoring and even less about the teaching team) it will need to respond to many more kinds of math and much larger kinds of ideas than we see in these demos. I find it much more likely that OpenAI will meet that challenge than they’ll meet the much larger challenge of providing learners with a tutor who is sensitive, persistent, and infinitely impatient as well.
Back in 2013, Kate Nonesuch wrote an article about the place of patience and kindness in teaching. I thought you'd like it: https://katenonesuch.com/2013/05/08/neither-kind-nor-patient/.
It also made me wonder about why the Chat GPT 4o demos I've seen all use a female voice? In the video you linked with the linear equation, I cringed every time the presenters interrupted the AI. Logically, I know that they are just moving the program along (and GPT can be long-winded!), but it pains me to think that we might be training people to interrupt women even more than they already do.
"calculation is to mathematics what chopping onions is to hosting a dinner party. It is important, but it is not central."
Ahh, I have been trying to get this idea across to my 6th graders who believe that all they need is a calculator. I think this metaphor might actually get into some of their heads.