LLM Coherence and Rule Adherence: Challenges and Solutions

Long-term coherence in LLM agents is limited by ephemeral internal states: Models trained to predict the next token exhibit performance degradation and catastrophic failure cascades after approximately 100 simulated business days, as demonstrated by Vending-Bench. This is because they lack mechanisms to overwrite faulty beliefs with ground truth.
Enlarging context windows does not solve coherence issues: Experiments show that increasing memory size can worsen outcomes, indicating architectural shortcomings beyond context length limitations.
Rule fatigue impacts LLM performance: As the number of simultaneous constraints increases, rule adherence decreases predictably, with later instructions being ignored due to uneven attention allocation. Experiments testing rule following with up to 800 rules confirm this.
Knowledge representation impacts coherence: Traditional retrieval pipelines that feed raw documents or similarity-based vectors amplify coherence weaknesses by delivering too little or too much information, forcing the model to infer relationships.
Structured knowledge representations improve coherence: Design patterns that externalize structure, such as graph-based knowledge representations and declarative grammars, mitigate failure modes by constraining what the model sees and how it can act, relying on a well-designed domain ontology.
According to additional sources, LLMs exhibit varying degrees of rule adherence: One experiment found Gemini 2.5 Flash achieved 81% average adherence to 400 rules, while Claude 3.7 Sonnet achieved 60% and GPT-4.1 only 26%.

Source:

https://www.linkedin.com/posts/anthony-alcaraz-b80763155_long-term-coherence-refers-to-a-large-language-model-activity-7327612208492822528-V4NC