top of page

AI Agent Reliability: Exponential Decay and Task Design

  • Core Finding:* AI agent reliability decays exponentially with task length, exhibiting a constant hazard rate, meaning the probability of failure remains constant at each step, irrespective of prior performance. This is supported by METR's results and Toby Ord's analysis.
  • Half-Life and Progress:* The "half-life" (time for success rate to drop to 50%) of top AI agents is doubling approximately every 7 months, indicating exponential progress in agent endurance across tasks like cybersecurity, ML coding, and reasoning.
  • Implications for Task Design:* To achieve high success rates (e.g., 99.9%), tasks must be completed significantly faster (e.g., 700x) than the agent's 50% reliability time horizon, necessitating modular task design, fallback plans, and memory-aware agents.
  • Predictive Modeling:* Agent failure can be predicted based on task length; doubling the task duration squares the risk (e.g., 80% success for 30-min task implies ~64% for 60-min task), highlighting the limitations of blindly chaining tasks.
  • Human vs. AI Performance:* Unlike AI agents, human performance doesn't exhibit the same steep exponential decay, as humans can reflect and correct mistakes, suggesting current AI agents lack the ability to recognize and correct errors during long tasks.
  • * _According to additional sources:_ Survival analysis, using a constant hazard rate model, accurately models the exponentially declining success rate of AI agents with task length, fitting data from Kwa et al. (2025) and predicting relationships between time horizons for different success rates (e.g., T80 ≈ 1/3 T50, T90 ≈ 1/7 T50). A key limitation is that the analysis is based on a specific task suite that may not generalize to other domains.
Source:
bottom of page