“I was curious to establish a baseline for when LLMs are effectively able to solve open math problems compared to where they struggle,” Somani said. The surprise was that, using the latest model, the frontier started to push forward a bit.
For anyone skeptical of machine intelligence, it’s a surprising result — and it’s not the only one. AI tools have become ubiquitous in mathematics, from formalization-oriented LLMs like Harmonic’s Aristotle to literature review tools like OpenAI’s deep research. But since the release of GPT 5.2 — which Somani describes as “anecdotally more skilled at mathematical reasoning than previous iterations” — the sheer volume of solved problems has become difficult to ignore, raising new questions about large language models’ ability to push the frontiers of human knowledge.
Since Christmas, 15 problems have been moved from “open” to “solved” on the Erdős website — and 11 of the solutions have specifically credited AI models as involved in the process.
“As such, many of these easier Erdős problems are now more likely to be solved by purely AI-based methods than by human or hybrid means,” Tao continued.
Another driving force is a recent shift toward formalization, a labor-intensive task that makes mathematical reasoning easier to verify and extend. Formalization doesn’t require use of AI or even computers, but a new crop of automated tools have made the process far easier. The open source “proof assistant” Lean, which was developed at Microsoft Research in 2013, has become widely used within the field as a way of formalizing proof— and AI tools like Harmonic’s Aristotle promise to automate much of the work of formalization.
For Harmonic founder Tudor Achim, the sudden jump in solved Erdős problems is less important than the fact that the world’s greatest mathematicians are starting to take those tools seriously. “I care more about the fact that math and computer science professors are using [AI tools],” Achim said. “These people have reputations to protect, so when they’re saying they use Aristotle or they use ChatGPT, that’s real evidence.”
International