Remote Processing
haiku.rag can use docling-serve for remote document processing and chunking, offloading resource-intensive operations to a dedicated service.
Overview
docling-serve is a REST API service that provides:
- Document conversion (PDF, DOCX, PPTX, images, etc.)
- Intelligent chunking with structure preservation
- OCR capabilities for scanned documents
- Table and figure extraction
When to Use docling-serve
Use local processing (default) when:
- Working with small to medium document volumes
- Running on development machines
- Want zero external dependencies
- Processing simple document formats
Use docling-serve when:
- Processing large volumes of documents
- Working with complex PDFs requiring OCR
- Running in production environments
- Want to separate compute-intensive tasks
- Need to scale document processing independently
Setup
Docker Compose (Recommended)
The easiest way to use haiku.rag with docling-serve is using the slim Docker image with docker-compose. See examples/docker/docker-compose.yml for a complete setup that includes both services.
Running docling-serve Manually
See the official docling-serve repository for installation options. The quickest way is using Docker:
To enable the web UI for debugging:
Configuration
Configure haiku.rag to use docling-serve. See the Document Processing guide for all available options.
# haiku.rag.yaml
processing:
converter: docling-serve # Use remote conversion
chunker: docling-serve # Use remote chunking
providers:
docling_serve:
base_url: http://localhost:5001
api_key: "" # Optional API key for authentication
Features
Remote Document Conversion
When converter: docling-serve is configured, documents are sent to the docling-serve API for conversion:
from haiku.rag.client import HaikuRAG
async with HaikuRAG() as client:
# PDF is processed by docling-serve
doc = await client.create_document_from_source("complex.pdf")
Remote Chunking
When chunker: docling-serve is configured, chunking is performed remotely:
processing:
chunker: docling-serve
chunker_type: hybrid # or hierarchical
chunk_size: 256
chunking_tokenizer: "Qwen/Qwen3-Embedding-0.6B"
chunking_merge_peers: true
chunking_use_markdown_tables: false
Advanced Configuration
Custom Tokenizers
You can use any HuggingFace tokenizer model:
Chunking Strategies
Hybrid Chunking (default):
- Best for most documents
- Preserves semantic boundaries
- Structure-aware splitting
Hierarchical Chunking:
- Maintains document hierarchy
- Better for deeply nested documents
- Preserves parent-child relationships
Table Handling
Control how tables are represented:
false(default): Tables as narrative texttrue: Tables as markdown format
VLM Picture Description with docling-serve
When using VLM picture description with docling-serve, the VLM API calls are made by the docling-serve container, not by haiku.rag. This requires additional configuration.
Enable Remote Services
docling-serve blocks external API calls by default. To enable VLM picture description, start docling-serve with:
docker run -p 5001:5001 -e DOCLING_SERVE_ENABLE_REMOTE_SERVICES=true quay.io/docling-project/docling-serve
Docker Networking
When docling-serve runs in Docker and your VLM (e.g., Ollama) runs on the host, localhost inside the container refers to the container itself, not your host machine.
Use host.docker.internal to reach host services from within Docker:
# haiku.rag.yaml
processing:
converter: docling-serve
chunker: docling-serve
conversion_options:
picture_description:
enabled: true
model:
provider: ollama
name: ministral-3
base_url: http://host.docker.internal:11434 # NOT localhost!
Complete Example
- Start Ollama with a vision model on your host:
- Start docling-serve with remote services enabled:
docker run -p 5001:5001 -e DOCLING_SERVE_ENABLE_REMOTE_SERVICES=true quay.io/docling-project/docling-serve
- Configure haiku.rag:
# haiku.rag.yaml
processing:
converter: docling-serve
chunker: docling-serve
conversion_options:
picture_description:
enabled: true
model:
provider: ollama
name: ministral-3
base_url: http://host.docker.internal:11434
providers:
docling_serve:
base_url: http://localhost:5001
- Add a document: