Overview
haiku.rag is an agentic RAG that runs locally and scales to production. Index PDFs, web pages, or whole directories. Ask questions and get cited answers. Build agents, skills, and MCP integrations on top.
haiku.rag is open-source first. The defaults run open models through Ollama so the full pipeline works without external API keys. Any provider Pydantic AI supports works in its place.
Built on LanceDB, Pydantic AI, and Docling. Embedded database, no servers required.
See it work
uv pip install haiku.rag
ollama pull qwen3-embedding:4b
ollama pull gpt-oss
haiku-rag init
haiku-rag add-src ~/Documents/some-paper.pdf
haiku-rag chat
The chat TUI is one way to interact with the database. haiku-rag ask and haiku-rag search cover one-shot CLI usage. Python integrations, skills, and the MCP server work against the same database.
What it does
Ingest. PDFs, DOCX, HTML, images, and 40+ formats via Docling. Add files, URLs, or whole directories with haiku-rag add-src, or run the haiku-ingester service for continuous, queue-backed ingestion from filesystem, HTTP, S3, or WebDAV sources.
Search. Hybrid retrieval (vector + full-text with reciprocal rank fusion), optional cross-encoder reranking, structure-aware context expansion. Image-as-query and cross-modal retrieval when configured with a multimodal embedder.
Answer. RAG skill with citations including page numbers, section headings, and visual grounding. Vision-capable models receive figure bytes alongside chunk text. Analysis skill with a sandboxed Python interpreter for aggregation and computation across documents.
Integrate. Use it from Python, the CLI, the MCP server, or as composable skills built on haiku.skills. Skills bundle tools, prompts, and state for use inside any Pydantic AI agent.
Operate. Embedded LanceDB by default. Also runs on S3, GCS, Azure, or LanceDB Cloud. Time-travel queries via LanceDB versioning. The haiku-ingester service runs continuously for production deployments.
Where to go next
- Quickstart: install, index, chat.
- Skills: the rag and rag-analysis skills you compose into Pydantic AI agents.
- Python API: use haiku.rag from code.
- MCP server: expose haiku.rag to Claude Desktop or other AI assistants.
- Tuning: improve retrieval quality.
- Configuration: every setting.
License
MIT. Source on GitHub.