Member-only story

DeepSeek-R1 & OpenAI’s o1 Aren’t Still As Intelligent As Portrayed.

Here are the results of testing R1 and o1 on the toughest mathematical questions known to humans.

4 min readFeb 4, 2025

--

Image generated with DALL-E 3

There’s a lot of enthusiasm (and fear) around AGI approaching soon.

The release of OpenAI’s o1 led to a big boost in this conviction.

And now, we have DeepSeek-R1, a powerful open-source reasoning model that beats o1 on many benchmarks, further fuelling the conviction train.

DeepSeek-R1 beats OpenAI-o1 on many mathematical and coding benchmarks (Image obtained from the original research paper)

The benchmark results are cool, but how about we test them on problems from a very tough mathematical benchmark called FrontierMath?

Here Comes ‘FrontierMath’

FrontierMath contains hundreds of exceptionally challenging problems from different mathematical domains crafted by expert mathematicians.

These problems are so hard that solving a typical one requires multiple hours of effort from a researcher in the relevant branch of mathematics.

For the harder questions, it takes them multiple days!

--

--

Dr. Ashish Bamania

Written by Dr. Ashish Bamania

🍰 I simplify the latest advances in AI, Quantum Computing & Software Engineering for you | 📰 Subscribe to my newsletter here: https://intoai.pub

Responses (1)