Limitations of Large Reasoning Models: A Deep Dive

LRMs enhance performance via inference-time scaling (generating & selecting among multiple solutions) and post-training on derivational traces* (incorporating reasoning steps into training data).
* LRMs primarily compile verification signals into dynamic methods for retrieving information from memory, rather than engaging in true reasoning. Evidence: Intermediate "chains of thought" can be semantically incorrect while still yielding correct final answers.
* LRMs, despite performance gains, are essentially "better generators" that produce a higher density of correct solution guesses, but do not demonstrate genuine reasoning capabilities.
* LRMs introduce variable computational costs proportional to problem complexity, unlike vanilla LLMs with predictable completion costs, potentially disrupting current LLM business models.
* According to additional sources, LRMs improve reasoning and planning by building on LLM architectures, but still suffer from generalization failures and hallucination issues, and their performance can be brittle to prompt variations.
* Additional sources note that LRMs employ techniques such as self-consistency (choosing the most common answer from multiple generated solutions) and verification-based approaches to check the correctness of LLM outputs.