In a surprising turn of events, AI heavyweight OpenAI sparked controversy by claiming that its latest model, GPT-5, had solved ten previously unsolved mathematical problems originally posed by legendary mathematician Paul Erdős—and made progress on eleven more. The claim, first posted via a now-deleted tweet from OpenAI’s VP of science, Kevin Weil, was swiftly challenged by the mathematics community and rival AI firms. Mathematician Thomas Bloom, curator of the “Erdős Problems” website, pushed back, explaining that when a problem is listed as “open” on his site, it only means he’s personally unaware of a published solution—not that the problem remains unsolved in the broader academic literature. Meanwhile, Meta’s chief AI scientist Yann LeCun quipped the lab was “hoisted by their own GPTards,” and Google DeepMind CEO Demis Hassabis bluntly called the episode “embarrassing.” Subsequent reporting clarified that GPT-5 did not create new proofs, but rather uncovered prior existing work that had escaped notice, turning what was billed as a breakthrough into a cautionary tale about hype, communication, and verifying claims.
Sources: eWeek, The Decoder
Key Takeaways
– The initial claim by OpenAI that GPT-5 solved “previously unsolved” Erdős problems turned out to be inaccurate: the model didn’t produce new mathematics but retrieved existing solutions.
– The backlash from industry peers – especially from Meta and Google DeepMind – underscores growing scrutiny of bold claims in the AI space, especially where academic or scientific credibility is involved.
– More broadly, the incident highlights how communication and marketing around AI capabilities can outpace both internal verification and external peer review, risking reputational harm and raising questions about transparency.
In-Depth
The saga surrounding OpenAI and its purported math breakthrough presents a striking snapshot of where the AI field stands today: full of promise, fast-moving, but also vulnerable to miscommunication and overreach. According to TechCrunch’s reporting, OpenAI’s VP of science Kevin Weil posted a tweet asserting that GPT-5 had “found solutions to 10 (!) previously unsolved Erdős problems and made progress on 11 others.” This sparked immediate pushback. Mathematician Thomas Bloom, who curates a leaderboard of open Erdős problems, clarified that listing a problem as “open” on his site simply means he does not know of a solution, not that none exist. That nuance arguably slipped the team at OpenAI, and competitors like Yann LeCun at Meta seized the moment. LeCun’s curt “Hoisted by their own GPTards” was matched by Demis Hassabis at Google DeepMind calling the entire affair “embarrassing.”
Why does this matter? On one level, it’s a reminder that technical claims—especially in high stakes research domains like mathematics—must stand up to rigorous peer review and verification. When they don’t, the fallout can be swift. In this case, GPT-5 wasn’t generating new mathematical arguments; it was effectively doing sophisticated literature search or rediscovery of solutions that had already been published. Some commentary (from sources like The Decoder) argues that this reveals both a useful facet of AI—its capacity to surface forgotten or obscure research—and a limitation: it doesn’t yet independently contribute novel proofs.
From a broader industry perspective, this episode feeds into a growing narrative: AI labs are under intense pressure to announce major wins, yet some of the biggest claims are colored more by marketing enthusiasm than by carefully audited science. The result? A credibility gap. For investors, researchers, and the public alike, the bar for “model achieved breakthrough” remains high—and for good reason. The implications extend beyond mathematics: whether it’s language models, computer vision, or scientific discovery, overstated claims can lead to regulatory scrutiny, loss of trust, and slower adoption.
Moreover, this case underscores that AI’s value may lie more in augmentation than replacement. If GPT-5 is uncovering old proofs rather than crafting new ones, then its real utility might be as a research assistant, pointing scholars to overlooked results, rather than as the lone breakthrough engine. That doesn’t diminish the technology—but it tempers expectations. For policymakers and industry watchers, the takeaway is clear: transparency, conservative framing, and rigorous independent review matter. The AI field is too important—and the risks too high—for overhyped claims to proliferate unchecked.
In essence, OpenAI’s moment of “embarrassing” math has become a cautionary tale for the broader field: innovation is critical, yes—but so is honesty, clarity, and humility about limits.

