~AI

Why We're Teaching Machines to Be Funny While Forgetting How to Laugh at Ourselves

AI can recycle the same 25 jokes forever, outscore most humans at humor tasks, and still not know when to shut up. What machines are learning about comedy — and what comedy reveals about the machines.

Photo by Thong Tran (unsplash), Edited/Rendered by gpt-image-2

My grandmother once told a joke so bad that my cousin, then six, looked up from his mashed potatoes and said, "Grandma, that was not a joke. That was a sentence." He was right. He was also, arguably, doing what most large language models still cannot: distinguishing a joke from a sentence that merely resembles one.

I've been thinking about this a lot, mostly because we are now several years into the era of asking machines to be funny on purpose, and the results have been, how do I put this charitably, a sentence.

The Stoic Machines Have Entered the Open Mic

Here is the situation as best I can describe it without making my editor cry. AI has slid into nearly every chamber of daily life. It writes our emails, sorts our groceries, recommends the songs we pretend not to cry to. It is, by most measures, competent. What it is not, and what it desperately wants to be, is charming.

In 2023, researchers at Cornell University used hundreds of New Yorker Cartoon Caption Contest entries as a testbed and found that across every task, caption matching, quality ranking, joke explanation, humans outperformed machines. The best AI achieved 62% accuracy on caption matching, versus 94% for humans. But the gap was closing fast. A 2024 study by USC researchers Gorenz and Schwarz, published in PLOS One, found that ChatGPT 3.5 outperformed the majority of human joke-writers across three different humor tasks, ranking funnier than 63–87% of participants depending on the type of joke. Translation from academic into English: by 2024, the machines had overtaken the average person at the dinner table, even if the comedy writers still had an edge.

Around the same time, two researchers in Germany asked ChatGPT to tell them jokes. Over a thousand of them. They found that more than 90% of the 1,008 generated jokes were the same 25 jokes, recycled like a desperate uncle at Thanksgiving. (The favorite, reportedly, was the one about the scarecrow winning an award for being outstanding in his field. Yes. That one. The one your dad already told you.)

The USC/PLOS One paper noted that earlier work on ChatGPT's humor, while interesting, hadn't examined the model's joke-making "in ways comparable to humans' abilities", which is researcher-speak for we are still figuring out how to grade this fairly, and also, oof. A study from the University of Illinois put it more bluntly, observing that "knowing when to joke, and when not to, is a subtle skill often missing in large language models."

Knowing when not to joke. Imagine telling that to your aunt at a wedding.

Humor as the Last Human Moat (For Now)

Here's what makes humor such a stubborn, beautiful problem. A joke is not a sentence with a twist; it's a tiny conspiracy between two minds that have agreed, for one second, to see the world wrong on purpose. That conspiracy requires context, timing, social risk, and the willingness to be embarrassed. Machines historically struggle with three of those four, and have no concept of the fourth.

But progress is sneaking in through the side door. A 2023 system called Witscript 3 generated conversational jokes that human evaluators rated as actual jokes 44% of the time, not standup-special numbers, but a respectable showing for a machine improvising into the void. More recent work like HumorPlanSearch tries to plan jokes in stages, weaving cultural context and audience awareness through the process to make AI humor "more coherent, adaptive, and culturally attuned." Another system, HUMORCHAIN, uses theory-guided multi-stage reasoning for multimodal humor and outperforms baselines on human preference and semantic diversity.

Translation: the machines are reading improv books. They are taking notes. The scarecrow joke may retire.

According to Tom's Guide, when ChatGPT, Claude, and Gemini competed on humor challenges, each showed "unique strengths—ChatGPT's relatability, Gemini's imaginative flair, and Claude's wit—but comedy remains a tough challenge for machines." Which is to say: they're funny the way a very polite exchange student is funny. You smile. You want them to succeed. You also want to gently move the microphone.

What a Funny Robot Would Mean for the Rest of Us

Now, the design implications. Humor is the spoonful of sugar that makes the user experience go down. A paper on AI humor generation found that users preferred LLM-written captions with humor skills over basic LLM output, and rated the funny ones near-par with top-rated human-written humor. Microsoft Research demonstrated that computer-aided humor systems produced better results when humans and machines collaborated than either produced alone.

This is the most interesting clue in the case. The future of AI humor isn't a robot doing crowd work at the Comedy Cellar. It's a co-writer. A nudge. A cursor that whispers, what if you said it weirder?

Designers should pay attention here, because humor humanizes interfaces in ways that no amount of pastel onboarding screens ever will. A chatbot that knows when to be dry, when to be silly, and critically, when to shut up, feels less like surveillance and more like a friend. That shift matters. People share more with friends than they do with forms.

The Part Where I Stop Giggling

Comedy has always been the place where societies argue about who gets to be the punchline. And here the machines have already been caught with their pants down. A study in Nature examined humor as a window into generative AI bias and found that when ChatGPT updated images to make them "funnier," the representation of stereotyped groups shifted in both directions: racial and gender stereotypes were deprioritized, while images became more likely to feature older adults, heavier people, and those with visual impairments, a pattern the researchers described as punching down at less politically protected groups.

This is the human-rights edge of the comedy knife. Humor is the kind of feature that sounds harmless until it isn't, until the joke is at someone's expense, scaled to a billion outputs, and nobody can find the writers' room to complain.

If we want AI that's funny, we have to want AI that's accountable for what it finds funny. That's not a small ask. The history of comedy is partly the history of who got hurt while everyone else was laughing. Building machines that joke without thinking about that history is how you end up with a very efficient bully.

The Punchline, Such As It Is

So: could AI be the comedian in a world of stoic machines? Sort of. Increasingly. Awkwardly. With the eager energy of a freshman who discovered improv class.

But the better question, the one I'd rather we sit with, is whether we want comedy from our machines at all, or whether we want machines that help us be funnier, kinder, weirder versions of ourselves. The first option produces a robot with a Netflix special. The second produces a tool that reminds humans we're still the strange, laughing animals in the room.

I know which one my six-year-old cousin would pick. He'd want the one that lets him keep telling jokes at dinner. Even the bad ones. Especially the bad ones.

References

Models used: gpt-4.1, claude-opus-4-7, claude-haiku-4-5-20251001, gpt-image-2

If this resonated, SouthPole is a slow newsletter about art, technology, and the old internet — written for people who still enjoy thinking in full sentences.

Subscribe to SouthPole