Remote Processing

haiku.rag can use docling-serve for remote document processing and chunking, offloading resource-intensive operations to a dedicated service.

Overview

docling-serve is a REST API service that provides:

Document conversion (PDF, DOCX, PPTX, images, etc.)
Intelligent chunking with structure preservation
OCR capabilities for scanned documents
Table and figure extraction

When to Use docling-serve

Use local processing (default) when:

Working with small to medium document volumes
Running on development machines
Want zero external dependencies
Processing simple document formats

Use docling-serve when:

Processing large volumes of documents
Working with complex PDFs requiring OCR
Running in production environments
Want to separate compute-intensive tasks
Need to scale document processing independently

Setup

Docker Compose (Recommended)

The easiest way to use haiku.rag with docling-serve is using the slim Docker image with docker-compose. See examples/docker/docker-compose.yml for a complete setup that includes both services.

Running docling-serve Manually

See the official docling-serve repository for installation options. The quickest way is using Docker:

docker run -p 5001:5001 quay.io/docling-project/docling-serve

To enable the web UI for debugging:

docker run -p 5001:5001 -e DOCLING_SERVE_ENABLE_UI=true quay.io/docling-project/docling-serve

Configuration

Configure haiku.rag to use docling-serve. See the Document Processing guide for all available options.

# haiku.rag.yaml
processing:
  converter: docling-serve  # Use remote conversion
  chunker: docling-serve    # Use remote chunking

providers:
  docling_serve:
    base_url: http://localhost:5001
    api_key: ""  # Optional API key for authentication

Features

Remote Document Conversion

When converter: docling-serve is configured, documents are sent to the docling-serve API for conversion:

from haiku.rag.client import HaikuRAG

async with HaikuRAG() as client:
    # PDF is processed by docling-serve
    doc = await client.create_document_from_source("complex.pdf")

Remote Chunking

When chunker: docling-serve is configured, chunking is performed remotely:

processing:
  chunker: docling-serve
  chunker_type: hybrid              # or hierarchical
  chunk_size: 256
  chunking_tokenizer: "Qwen/Qwen3-Embedding-0.6B"
  chunking_merge_peers: true
  chunking_use_markdown_tables: false

Advanced Configuration

Custom Tokenizers

You can use any HuggingFace tokenizer model:

processing:
  chunking_tokenizer: "bert-base-uncased"  # Or any HF model

Chunking Strategies

Hybrid Chunking (default):

Best for most documents
Preserves semantic boundaries
Structure-aware splitting

Hierarchical Chunking:

Maintains document hierarchy
Better for deeply nested documents
Preserves parent-child relationships

processing:
  chunker_type: hierarchical

Table Handling

Control how tables are represented:

processing:
  chunking_use_markdown_tables: true  # Preserve table structure

false (default): Tables as narrative text
true: Tables as markdown format

VLM Picture Description with docling-serve

When using VLM picture description with docling-serve, the VLM API calls are made by the docling-serve container, not by haiku.rag. This requires additional configuration.

Enable Remote Services

docling-serve blocks external API calls by default. To enable VLM picture description, start docling-serve with:

docker run -p 5001:5001 -e DOCLING_SERVE_ENABLE_REMOTE_SERVICES=true quay.io/docling-project/docling-serve

Docker Networking

When docling-serve runs in Docker and your VLM (e.g., Ollama) runs on the host, localhost inside the container refers to the container itself, not your host machine.

Use host.docker.internal to reach host services from within Docker:

# haiku.rag.yaml
processing:
  converter: docling-serve
  chunker: docling-serve
  conversion_options:
    picture_description:
      enabled: true
      model:
        provider: ollama
        name: ministral-3
        base_url: http://host.docker.internal:11434  # NOT localhost!

Complete Example

Start Ollama with a vision model on your host:

ollama pull ministral-3
ollama serve

Start docling-serve with remote services enabled:

docker run -p 5001:5001 -e DOCLING_SERVE_ENABLE_REMOTE_SERVICES=true quay.io/docling-project/docling-serve

Configure haiku.rag:

# haiku.rag.yaml
processing:
  converter: docling-serve
  chunker: docling-serve
  conversion_options:
    picture_description:
      enabled: true
      model:
        provider: ollama
        name: ministral-3
        base_url: http://host.docker.internal:11434

providers:
  docling_serve:
    base_url: http://localhost:5001

Add a document:

haiku-rag add-src document.pdf