top of page

NodeRAG: Heterogeneous Graph Retrieval-Augmented Generation

This blog post summarizes a scientific paper introducing NodeRAG, a novel graph-centric framework designed to enhance Retrieval-Augmented Generation (RAG) through heterogeneous graph structures. NodeRAG aims to improve the integration of graph methodologies into the RAG workflow, addressing limitations in existing graph-based RAG methods.

Problem Definition

  • Existing RAG Limitations: Current RAG methods often struggle with multi-hop reasoning and summary-level queries due to insufficient data structure utilization.
  • Graph-based RAG: Aims to enhance retrieval and question-answering by leveraging LLMs to decompose raw data into graph structures. However, many existing graph-based RAG methods do not prioritize graph structure design, leading to integration issues and performance degradation.

Proposed Solution

  • NodeRAG Framework: A graph-centric framework that uses heterogeneous graph structures for seamless integration of graph methodologies into the RAG workflow.
  • Heterogeneous Graph: Decomposes information into distinct node types, including entities, relationships, text chunks, and summaries, for fine-grained retrieval.
  • Key Features:
    • Fine-grained and explainable retrieval.
    • Unified information retrieval across different levels.
  • Pipeline: Graph indexing and graph searching.
    • Graph Indexing: Graph decomposition, graph augmentation, and graph enrichment.
    • Graph Searching: Combines heterograph structure with graph algorithms for efficient information retrieval.
  • Heterograph Structure:
    • Nodes are categorized into types such as Entity (N), Relationship (R), Semantic Unit (S), Attribute (A), High-Level Elements (H), High-Level Overview (O), and Text (T).
    • The heterograph is mathematically defined as G = (V, E, Ψ), where V is the set of nodes, E is the set of edges, and Ψ : V → Types is a mapping function assigning each node to a specific type.
  • Graph Augmentation Methods:
    • Node importance-based augmentation: Selects structurally significant entities using K-core decomposition and betweenness centrality.
    • Community detection-based aggregation: Applies the Leiden algorithm to segment the graph into communities.
  • Graph Enrichment:
    • Integrates Hierarchical Navigable Small World (HNSW) algorithm for efficient retrieval of semantically similar nodes.

Results

  • Benchmarks: Evaluated on HotpotQA, MuSiQue, MultiHop-RAG, and RAG-QA Arena.
  • Baselines: Compared against NaiveRAG, HyDE, GraphRAG, and LightRAG.
  • Metrics: Accuracy (Acc), average number of retrieved tokens (#Token), and win and tie ratio (W+T).
  • Key Findings:
    • NodeRAG outperforms competing methods on HotpotQA, MuSiQue, and MultiHop-RAG with higher accuracy and fewer tokens.
    • NodeRAG achieves higher win ratios against GraphRAG, LightRAG, NaiveRAG, and HyDE across all six domains in pairwise comparisons.

Importance

  • Enhanced RAG Performance: NodeRAG enhances RAG performance by optimizing graph structures for effective and fine-grained retrieval.
  • Fine-Grained Retrieval: Constructs a heterograph with functionally distinct nodes, enabling more precise information retrieval.
  • Improved Reasoning: Outperforms existing methods across multi-hop reasoning benchmarks and open-ended retrieval tasks.
Source:
bottom of page