top of page

PhiloAgent: Agentic RAG with Execution Graphs

  • The PhiloAgent system employs a stateful execution graph* instead of simple prompt chaining to create dynamic and structured workflows for AI agents, enabling more complex reasoning and adaptability.
  • The system architecture consists of several key nodes: a Conversation Node for generating replies, a Retrieval Tool Node for fetching information via MongoDB-powered vector search (agentic RAG), a Summarize Context Node to condense retrieved passages, and a Summarize Conversation Node* to maintain context within the LLM's window.
  • Implementation details* include: Pydantic for in-memory state management (`PhilosopherState`), LangChain for tool orchestration, Groq's Llama 70B for low-latency responses, smaller 8B models for summarization, dynamic prompt templates, FastAPI & WebSockets for serving a real-time REST API, and Opik by Comet for monitoring and evaluation.
  • The ReAct pattern* is implemented through the `conversation_node`, `retriever_node`, and `summarize_context_node`, enabling the agent to reason, act (retrieve information), and observe (summarize context) in a cycle.
  • The system uses conditional edges* in the LangGraph to dynamically decide whether to summarize the conversation based on its length, optimizing for context window size and cost. Specifically, the `should_summarize_conversation` function checks if the number of messages exceeds `TOTAL_MESSAGES_SUMMARY_TRIGGER` to trigger summarization.
  • According to additional sources, the PhiloAgent course* uses LangGraph to implement the agent, highlighting the trade-off between workflows (reliable but rigid) and agents (adaptable but potentially less reliable). The course uses Groq's `llama-3.3-70b-versatile` model for the main conversation and `llama-3.1-8b-instant` for context summarization, emphasizing the importance of prompt engineering with a `PHILOSOPHER_CHARACTER_CARD` to guide the agent's behavior.
Source:
bottom of page