Lightweight, private, and customizable retrieval-augmented chatbot running entirely on your Mac.
Based on the excellent work by pruthvirajcyn and his Medium article.
⚙️ About This Project
This is my personal implementation of a local RAG (Retrieval-Augmented Generation) chatbot using:
- Ollama for running open-source LLMs and embedding models locally.
- Streamlit for a clean and interactive chat UI.
- ChromaDB for storing and querying vector embeddings.
As of 2025-07-17, I’m using:
- 🔍 Embedding model:
nomic-embed-text-v2-moe
- 🧠 LLM:
gemma3n
💡 Why Run a RAG Locally?
- 🔒 Privacy: No data is sent to the cloud. Upload and query your documents entirely offline.
- 💸 Cost-effective: No API tokens or cloud GPU costs. You only pay electricity.
- 📚 Better than summarizing: With long PDFs or multiple documents, even summaries may not contain the context you need. A RAG chatbot can drill deeper and provide contextual answers.
✅ Recommended: At least 16GB of RAM on your Mac. Preferably 24GB+ for smoother experience.
🛠️ 1. Installation
1. Clone the Repository
git clone https://github.com/eplt/RAG_Ollama_Mac.git
cd RAG_Ollama_Mac
2. Create a Virtual Environment
python3 -m venv venv
source venv/bin/activate
3. Install Dependencies
pip install -r ./src/requirements.txt
🚀 2. Usage
1. Start Ollama and Pull the Models
ollama serve
ollama pull gemma3n
ollama pull toshk0/nomic-embed-text-v2-moe:Q6_K
2. Load Documents
Place your .pdf
files in the data/
directory.
python ./src/load_docs.py
To reset and reload the vector database:
python ./src/load_docs.py --reset
3. Launch the Chatbot Interface
streamlit run ./src/UI.py
4. Start Chatting
Ask questions and the chatbot will respond using relevant context retrieved from your documents.
🧩 3. Customization
- ✏️ Modify Prompts
Update prompt templates inUI.py
to guide the chatbot’s tone or behavior. - 🔄 Try Different Models
Ollama supports various LLMs and embedding models. Runollama list
to see what’s available or try pulling new ones. - ⚙️ Tune Retrieval Parameters
Adjust chunk size, overlaps, or top-K retrieval values inload_docs.py
for improved performance. - 🚀 Extend the Interface
Add features like file upload, chat history, user authentication, or export options using Streamlit’s powerful features.
🧯 4. Troubleshooting
- Ollama not running?
Make sureollama serve
is active in a terminal tab. - Missing models?
Runollama list
to verify models are downloaded correctly. - Dependency issues?
Double-check your Python version (3.7+) and re-create the virtual environment. - Streamlit errors?
Ensure you’re running the app from the correct path and activate your virtual environment.
📌 Notes & Future Plans
- Planning to support non-PDF formats (Markdown, .txt, maybe HTML).
- Will experiment with additional LLMs like
phi-3
,mistral
, andllama3
. - Might integrate chat history persistence and better document management.
👋 Final Thoughts
Local RAG is now more accessible than ever. With powerful small models and tools like Ollama, anyone can build a private, intelligent assistant — no cloud needed.
If you found this useful or have ideas to improve it, feel free to open a PR or drop a star ⭐️
Leave a Comment