Architecture

High-level overview of haiku.rag components and data flow.

System Overview

flowchart TB
    subgraph Sources["Document Sources"]
        Files[Files]
        URLs[URLs]
        Text[Text]
    end

    subgraph Processing["Processing Pipeline"]
        Converter[Converter]
        Chunker[Chunker]
        Embedder[Embedder]
    end

    subgraph Storage["Storage Layer"]
        LanceDB[(LanceDB)]
    end

    subgraph Agents["Agent Layer"]
        QA[QA Agent]
        Skill[RAG Skill]
        Research[Research Graph]
        RLM[RLM Agent]
    end

    subgraph Apps["Applications"]
        CLI[CLI]
        ChatTUI[Chat TUI]
        WebApp[Web App]
        Inspector[Inspector]
        MCP[MCP Server]
    end

    Sources --> Converter
    Converter --> Chunker
    Chunker --> Embedder
    Embedder --> LanceDB

    LanceDB --> Agents
    Agents --> Apps

Core Components

Storage Layer

LanceDB provides vector storage with full-text search capabilities:

DocumentRecord - Document metadata and full content
ChunkRecord - Text chunks with embeddings and structural metadata
SettingsRecord - Database configuration and version info

Repositories handle CRUD operations:

DocumentRepository - Create, read, update, delete documents
ChunkRepository - Chunk management and hybrid search
SettingsRepository - Configuration persistence

Processing Pipeline

flowchart LR
    Source[Source] --> Converter
    Converter --> DoclingDoc[DoclingDocument]
    DoclingDoc --> Chunker
    Chunker --> Chunks[Chunks]
    Chunks --> Embedder
    Embedder --> Vectors[Vectors]
    Vectors --> DB[(LanceDB)]

Converters transform sources into DoclingDocuments:

docling-local - Local Docling processing
docling-serve - Remote processing via docling-serve

Chunkers split documents into semantic chunks:

Preserves document structure (tables, lists, code blocks)
Maintains provenance (page numbers, headings)
Configurable chunk size

Embedders generate vector representations:

Provider	Models
Ollama	nomic-embed-text, mxbai-embed-large
OpenAI	text-embedding-3-small, text-embedding-3-large
VoyageAI	voyage-3, voyage-code-3
vLLM	Any compatible model
LM Studio	Any compatible model

Agent Layer

Three agent types and a RAG skill for different use cases:

flowchart TB
    subgraph QA["QA Agent"]
        Q1[Question] --> S1[Search]
        S1 --> A1[Answer]
    end

    subgraph Skill["RAG Skill"]
        Q2[Question] --> Tools[Tool Selection]
        Tools --> S2[Search / Ask / Analyze]
        S2 --> A2[Answer]
        A2 --> State[RAG State]
        State -.-> Q2
    end

    subgraph Research["Research Graph"]
        Q3[Question] --> Plan[Plan Next]
        Plan --> SearchOne[Search One]
        SearchOne --> Eval[Evaluate]
        Eval -->|Continue| Plan
        Eval -->|Done| Synthesize[Synthesize]
    end

    subgraph RLM["RLM Agent"]
        Q4[Question] --> Code[Write Code]
        Code --> Execute[Execute]
        Execute --> Examine[Examine Results]
        Examine -->|Iterate| Code
        Examine -->|Done| A4[Answer]
    end

QA Agent - Single-turn question answering:

Searches for relevant chunks
Expands context around results
Generates answer with optional citations

RAG Skill - Multi-turn conversational RAG via haiku.skills:

Bundles search, list_documents, get_document, ask, analyze, and research tools
Managed RAGState for session state (citations, QA history, document filters)
Integrates with any pydantic-ai agent via SkillToolset
Powers both the Chat TUI and web application

Research Graph - Iterative research workflow:

Proposes one question at a time, evaluates the answer, then decides whether to continue
Prior answers let the planner skip redundant searches
Synthesizes structured report

RLM Agent - Complex analytical tasks via code execution:

Writes Python code to explore the knowledge base
Executes in sandboxed environment
Handles aggregation, computation, multi-document analysis
Iterates until answer is found

Applications

Application	Interface	Use Case
CLI	Command line	Scripts, one-off queries, batch processing
Chat TUI	Terminal	Interactive conversations
Web App	Browser	Team collaboration, visual interface
Inspector	Terminal	Database exploration, debugging
MCP Server	Protocol	AI assistant integration

Data Flow

Document Ingestion

sequenceDiagram
    participant User
    participant CLI
    participant Converter
    participant Chunker
    participant Embedder
    participant DB as LanceDB

    User->>CLI: add-src document.pdf
    CLI->>Converter: Convert to DoclingDocument
    Converter-->>CLI: DoclingDocument
    CLI->>Chunker: Split into chunks
    Chunker-->>CLI: Chunks with metadata
    CLI->>Embedder: Generate embeddings
    Embedder-->>CLI: Vectors
    CLI->>DB: Store document + chunks
    DB-->>User: Document ID

Search and QA

sequenceDiagram
    participant User
    participant Agent
    participant Embedder
    participant DB as LanceDB
    participant LLM

    User->>Agent: Ask question
    Agent->>Embedder: Embed query
    Embedder-->>Agent: Query vector
    Agent->>DB: Hybrid search
    DB-->>Agent: Relevant chunks
    Agent->>Agent: Expand context
    Agent->>LLM: Generate answer
    LLM-->>Agent: Answer + citations
    Agent-->>User: Response

Configuration

Configuration flows through the system:

CLI args → Environment variables → haiku.rag.yaml → Defaults

Key configuration areas:

Storage - Database path, vacuum settings
Embeddings - Provider, model, dimensions
Processing - Chunk size, converter, chunker
Search - Limits, context expansion
QA/Research - Model, iterations, concurrency
Providers - Ollama, vLLM, docling-serve URLs

See Configuration for details.