Quickstart
Install haiku.rag, index a document, and chat with it.
Install
You also need Ollama for the default embedding and answering models:
Prefer OpenAI?
Drop this into a haiku.rag.yaml next to where you'll run the CLI:
embeddings:
model:
provider: openai
name: text-embedding-3-small
vector_dim: 1536
qa:
model:
provider: openai
name: gpt-4o-mini
Then export OPENAI_API_KEY="sk-..." and continue with the rest of this page. Any provider Pydantic AI supports works the same way. See Providers.
Initialize
This creates a LanceDB database in your platform's user directory. Pass --db to any subcommand to use a different path:
Add a document
Add a file, a URL, or a whole folder:
Or paste text inline:
Each add-src call converts the file with Docling, splits it into chunks, embeds them, and writes everything to LanceDB. Run haiku-rag list to see what you've added, haiku-rag info for a database summary.
Chat
Ask a question. The agent searches your documents, expands context around the hits, and answers with citations pointing back to the source page and section. Citations are expandable, with visual grounding so you can see the chunk highlighted on the original page. Follow-ups continue within the same session. Start a new session when you switch topics.
You can also ask a single question directly from the CLI without launching the TUI:
Where to go next
- Chat: sessions, citations, and the full TUI.
- CLI reference: every command.
- Python API: use haiku.rag in your own code.
- Skills: the rag and rag-analysis skills the client wraps.
- Tuning: better retrieval.
- Configuration: every setting.