Local Hybrid RAG System with OpenSearch and LLM

* The project implements a local Hybrid RAG system, combining BM25 and semantic search, using open-source components for experimentation and privacy. The system architecture includes a Streamlit app, OCR (PyTesseract), a RAG ingestion pipeline, OpenSearch as a vector DB, hybrid search, prompt templating, and a local LLM (via Ollama).
* The RAG ingestion pipeline cleans text, performs chunking, extracts entities, and generates embeddings, transforming raw text into structured data for retrieval. The system uses OpenSearch to store text and embeddings, enabling efficient retrieval via vector similarity and traditional search.
* The system allows swapping out components like LLMs, OCR methods, and embedding models from Hugging Face, offering flexibility and control over privacy. The system can be enhanced with chunking methods, fine-tuned embeddings, different LLMs, optimized OCR, and metadata for advanced search.
* According to additional sources, the setup involves Docker for OpenSearch, Ollama for LLMs, and Python 3.11. OpenSearch requires a specific configuration to enable hybrid search, involving the creation of a search pipeline with normalization and combination techniques, using arithmetic mean with weights (e.g., 0.3 and 0.7).
* The Python environment requires installing dependencies from `requirements.txt`, including Streamlit, SentenceTransformer, and PyTesseract. Configuration involves setting paths for the embedding model (`EMBEDDING_MODEL_PATH`), embedding dimension (`EMBEDDING_DIMENSION`), text chunk size (`TEXT_CHUNK_SIZE`), and the Ollama model name (`OLLAMA_MODEL_NAME`) in `constants.py`.
* Additional sources infer that the system likely uses LangChain and Hugging Face Transformers. Performance depends on retrieval speed, generation speed, and the quality of embeddings and retrieved documents.

Source:

https://www.linkedin.com/posts/shirin-khosravi-jam_bm25-semanticsearch-llm-activity-7325797287585103874-fSp4