top of page
RAG for Legal Information Retrieval with NMF and KG
This blog post summarizes a scientific paper that introduces a generative AI system designed to enhance legal information retrieval and AI reasoning. The system integrates Retrieval-Augmented Generation (RAG), Vector Stores (VS), and Knowledge Graphs (KG) constructed via Non-Negative Matrix Factorization (NMF).
Problem Definition
- Traditional legal information retrieval methods often miss subtle conceptual overlaps and deep contextual cues in legal inquiries.
- The legal domain is complex, encompassing constitutions, statutes, court rules, regulations, ordinances, and case law.
- Traditional methods rely on Boolean logic and lexical indexing (TF-IDF), which may not capture conceptual relationships.
Proposed Solution
- The proposed solution is a generative AI system called Smart-SLIC that combines RAG, Vector Stores (VS), and Knowledge Graphs (KG).
- The system uses VS for capturing semantic meaning beyond keyword matching. Tools like BERT and GPT are used to embed legal texts into dense vector representations.
- It employs KG to formalize relationships between legal concepts (statutes, cases, doctrines), enabling structured navigation and explicit linking of legal authorities, using Neo4j.
- NMF is applied to uncover latent topics and patterns in unstructured text, factorizing word-embedding matrices into interpretable topics. Tensor Extraction of Latent Features (T-ELF) is combined with automatic model selection (NMFk).
- Limitations include incomplete author attribution in networks, the need for additional datasets, and systematic reconciliation of informal post-decree agreements with formal judgments.
Results
- The system was tested on a dataset including:
- Constitution: 265 sections
- Statutes: 28,251 sections
- Court of Appeals: 10,072 cases
- Supreme Court: 5,727 cases
- Data was decomposed hierarchically with NMFk.
- Legal citations were collected using chat-gpt-3.5-turbo.
- The QA performance of the SLIC-SMART system was evaluated with various performance metrics and compared retrieval methods from embedding space.
- The system was evaluated through case studies, including constitutional, statutory, and case law analyses.
Core Technologies
- Vector Stores (VS): Embed legal texts into dense vector representations (e.g., BERT, GPT) to capture semantic meanings beyond keyword matching, implemented using Milvus.
- Knowledge Graphs (KG): Formalize relationships between legal concepts (statutes, cases, doctrines), enabling structured navigation and explicit linking of legal authorities, using Neo4j.
- Non-Negative Matrix Factorization (NMF): Uncover latent topics and patterns in unstructured text, factorizing word-embedding matrices into interpretable topics.
Importance
- The system advances computational law by providing a scalable and interpretable method for retrieving and reasoning over complex legal corpora.
- The experimental results show that chunking combined with hierarchical NMFk improves accuracy.
- Future directions include refining the citation extraction pipeline and expanding the collection to encompass broader legal instruments.
Source:
bottom of page