Member-only story
DeepSeek-R1 & OpenAI’s o1 Aren’t Still As Intelligent As Portrayed.
Here are the results of testing R1 and o1 on the toughest mathematical questions known to humans.
There’s a lot of enthusiasm (and fear) around AGI approaching soon.
The release of OpenAI’s o1 led to a big boost in this conviction.
And now, we have DeepSeek-R1, a powerful open-source reasoning model that beats o1 on many benchmarks, further fuelling the conviction train.
The benchmark results are cool, but how about we test them on problems from a very tough mathematical benchmark called FrontierMath?
Here Comes ‘FrontierMath’
FrontierMath contains hundreds of exceptionally challenging problems from different mathematical domains crafted by expert mathematicians.
These problems are so hard that solving a typical one requires multiple hours of effort from a researcher in the relevant branch of mathematics.
For the harder questions, it takes them multiple days!