Skip to content

Python API

Use haiku.rag directly in your Python applications.

Basic Usage

from pathlib import Path
from haiku.rag.client import HaikuRAG

# Create a new database
async with HaikuRAG("path/to/database.lancedb", create=True) as client:
    # Your code here
    pass

# Open an existing database (will fail if database doesn't exist)
async with HaikuRAG("path/to/database.lancedb") as client:
    # Your code here
    pass

# Open in read-only mode (blocks writes)
async with HaikuRAG("path/to/database.lancedb", read_only=True) as client:
    results = await client.search("query")  # Read operations work
    # await client.create_document(...)  # Would raise ReadOnlyError

Note

Databases must be explicitly created with create=True or via haiku-rag init before use. Operations on non-existent databases will raise FileNotFoundError.

Note

Read-only mode is useful for safely accessing databases without risk of modification. It blocks all write operations and prevents settings from being saved.

Database Migrations

When upgrading haiku.rag to a version with schema changes, opening an existing database will raise MigrationRequiredError. Run haiku-rag migrate to apply pending migrations before using the database. See CLI Database Management for details.

Document Management

Creating Documents

From text:

doc = await client.create_document(
    content="Your document content here",
    uri="doc://example",
    title="My Example Document",  # optional human‑readable title
    metadata={"source": "manual", "topic": "example"}
)

From HTML content (preserves document structure):

html_content = "<h1>Title</h1><p>Paragraph</p><ul><li>Item 1</li></ul>"
doc = await client.create_document(
    content=html_content,
    uri="doc://html-example",
    format="html"  # parse as HTML instead of markdown
)

The format parameter controls how text content is parsed:

  • "md" (default) - Parse as Markdown
  • "html" - Parse as HTML, preserving semantic structure (headings, lists, tables)
  • "plain" - Plain text, no parsing (creates a simple text document)

Note

The document's content field stores the markdown export of the parsed document for consistent display. The original DoclingDocument structure is preserved in the docling_document field (zstd-compressed, without page images). Page images are stored separately in docling_pages.

From file:

doc = await client.create_document_from_source(
    "path/to/document.pdf", title="Project Brief"
)

From URL:

doc = await client.create_document_from_source(
    "https://example.com/article.html", title="Example Article"
)

Importing Pre-Processed Documents

If you process documents externally or need custom processing, use import_document():

from haiku.rag.store.models.chunk import Chunk

# Convert your source to a DoclingDocument
docling_doc = await client.convert("path/to/document.pdf")

# Create chunks (embeddings optional - will be generated if missing)
chunks = [
    Chunk(
        content="This is the first chunk",
        metadata={"section": "intro"},
        order=0,
    ),
    Chunk(
        content="This is the second chunk",
        metadata={"section": "body"},
        embedding=[0.1] * 1024,  # Optional: pre-computed embedding
        order=1,
    ),
]

# Import document with custom chunks
doc = await client.import_document(
    docling_document=docling_doc,
    chunks=chunks,
    uri="doc://custom",
    title="Custom Document",
    metadata={"source": "external-pipeline"},
)

The docling_document provides rich metadata for visual grounding, page numbers, and section headings. Content is automatically extracted from the DoclingDocument.

See Custom Processing Pipelines for building pipelines with convert(), chunk(), and embed_chunks().

Retrieving Documents

By ID:

doc = await client.get_document_by_id("document-id-string")

By URI:

doc = await client.get_document_by_uri("file:///path/to/document.pdf")

List all documents:

docs = await client.list_documents(limit=10, offset=0)

# Include full content and docling document (not loaded by default)
docs = await client.list_documents(include_content=True)

Filter documents by properties:

# Filter by URI pattern
docs = await client.list_documents(filter="uri LIKE '%arxiv%'")

# Filter by exact title
docs = await client.list_documents(filter="title = 'My Document'")

# Combine multiple conditions
docs = await client.list_documents(
    limit=10,
    filter="uri LIKE '%.pdf' AND title LIKE '%paper%'"
)

Count documents:

# Count all documents
total = await client.count_documents()

# Count with filter
pdf_count = await client.count_documents(filter="uri LIKE '%.pdf'")

Updating Documents

# Update content (triggers re-chunking)
await client.update_document(document_id=doc.id, content="New content")

# Update metadata only (no re-chunking)
await client.update_document(
    document_id=doc.id,
    metadata={"version": "2.0", "updated_by": "admin"}
)

# Update title only (no re-chunking)
await client.update_document(document_id=doc.id, title="New Title")

# Update multiple fields at once
await client.update_document(
    document_id=doc.id,
    content="New content",
    title="Updated Title",
    metadata={"status": "final"}
)

# Use custom chunks (embeddings optional - will be generated if missing)
custom_chunks = [
    Chunk(content="Custom chunk 1"),
    Chunk(content="Custom chunk 2", embedding=[...]),  # Pre-computed embedding
]
await client.update_document(document_id=doc.id, chunks=custom_chunks)

Notes:

  • Updates to only metadata or title skip re-chunking
  • Updates to content trigger re-chunking and re-embedding
  • Custom chunks with embeddings are stored as-is; missing embeddings are generated automatically

Deleting Documents

await client.delete_document(doc.id)

Rebuilding the Database

from haiku.rag.client import RebuildMode

# Full rebuild (default) - re-converts from source files, re-chunks, re-embeds
async for doc_id in client.rebuild_database():
    print(f"Processed document {doc_id}")

# Re-chunk from stored content (no source file access)
async for doc_id in client.rebuild_database(mode=RebuildMode.RECHUNK):
    print(f"Processed document {doc_id}")

# Only regenerate embeddings (fastest, keeps existing chunks)
async for doc_id in client.rebuild_database(mode=RebuildMode.EMBED_ONLY):
    print(f"Processed document {doc_id}")

Rebuild modes:

  • RebuildMode.FULL - Re-convert from source files, re-chunk, re-embed (default)
  • RebuildMode.RECHUNK - Re-chunk from existing document content, re-embed
  • RebuildMode.EMBED_ONLY - Keep existing chunks, only regenerate embeddings
  • RebuildMode.TITLE_ONLY - Generate titles for untitled documents (no re-chunking or re-embedding)

Generating Titles

Generate a title for an existing document on demand:

title = await client.generate_title(doc)
if title:
    await client.update_document(document_id=doc.id, title=title)

Uses the same two-tier approach as automatic ingestion: structural extraction from DoclingDocument metadata first, with LLM fallback via processing.title_model. Unlike ingestion, this method does not catch exceptions — if the LLM call fails, the error propagates.

To batch-generate titles for all untitled documents, use RebuildMode.TITLE_ONLY:

async for doc_id in client.rebuild_database(mode=RebuildMode.TITLE_ONLY):
    print(f"Generated title for {doc_id}")

See Automatic Title Generation for configuration details.

Maintenance

Run maintenance to optimize storage and prune old table versions:

await client.vacuum()

This compacts tables and removes historical versions to keep disk usage in check. It’s safe to run anytime, for example after bulk imports or periodically in long‑running apps.

Atomic Writes and Rollback

Document create and update operations take a snapshot of table versions before any write and automatically roll back to that snapshot if something fails (for example, during chunking or embedding). This restores both the documents and chunks tables to their pre‑operation state using LanceDB’s table versioning.

  • Applies to: create_document(...), create_document_from_source(...), update_document(...), and internal rebuild/update flows.
  • Scope: Both document rows and all associated chunks are rolled back together.
  • Vacuum: Running vacuum() later prunes old versions for disk efficiency; rollbacks occur immediately during the failing operation and are not impacted.

Searching Documents

The search method performs native hybrid search (vector + full-text) using LanceDB with optional reranking for improved relevance:

Basic hybrid search (default):

results = await client.search("machine learning algorithms", limit=5)
for result in results:
    print(f"Score: {result.score:.3f}")
    print(f"Content: {result.content}")
    print(f"Document ID: {result.document_id}")

Search with different search types:

# Vector search only
results = await client.search(
    query="machine learning",
    limit=5,
    search_type="vector"
)

# Full-text search only
results = await client.search(
    query="machine learning",
    limit=5,
    search_type="fts"
)

# Hybrid search (default - combines vector + fts with native LanceDB RRF)
results = await client.search(
    query="machine learning",
    limit=5,
    search_type="hybrid"
)

# Process results
for result in results:
    print(f"Relevance: {result.score:.3f}")
    print(f"Content: {result.content}")
    print(f"From document: {result.document_id}")
    print(f"Document URI: {result.document_uri}")
    print(f"Document Title: {result.document_title}")  # when available

Filtering Search Results

Filter search results to only include chunks from documents matching specific criteria:

# Filter by document URI pattern
results = await client.search(
    query="machine learning",
    limit=5,
    filter="uri LIKE '%arxiv%'"
)

# Filter by exact document title
results = await client.search(
    query="neural networks",
    limit=5,
    filter="title = 'Deep Learning Guide'"
)

# Combine multiple filter conditions
results = await client.search(
    query="AI research",
    limit=5,
    filter="uri LIKE '%.pdf' AND title LIKE '%paper%'"
)

# Filter with any search type
results = await client.search(
    query="transformers",
    limit=5,
    search_type="vector",
    filter="uri LIKE '%huggingface%'"
)

Note: Filters apply to document properties only. Available columns for filtering: - id - Document ID - uri - Document URI/URL - title - Document title (if set) - created_at, updated_at - Timestamps - metadata - Document metadata (as string, use LIKE for pattern matching)

Expanding Search Context

Expand search results with surrounding content from the document:

# Get initial search results
search_results = await client.search("machine learning", limit=3)

# Expand with section-bounded context
expanded_results = await client.expand_context(search_results)

for result in expanded_results:
    print(f"Expanded content: {result.content}")

Context expansion is automatic and section-aware. For structured documents (with section headers), expansion includes the entire section containing the match. For sections that exceed the budget or are too small (e.g., a title+authors area), expansion grows outward item-by-item from the match center, skipping noise labels (footnotes, page headers) — this naturally crosses into adjacent sections until the budget is filled. For unstructured documents, expansion grows outward item-by-item. Results without doc_item_refs (e.g., custom chunks passed to import_document) pass through unexpanded.

Configuration:

  • search.max_context_chars: Maximum characters in expanded context. Default: 10000.

Smart Merging: When expanded results overlap within the same document, they are automatically merged into a single result with continuous content and the highest relevance score.

Question Answering

Ask questions about your documents:

answer, citations = await client.ask("Who is the author of haiku.rag?")
print(answer)
for cite in citations:
    print(f"  [{cite.chunk_id}] {cite.document_title or cite.document_uri}")

Customize the QA agent's behavior with a custom system prompt:

custom_prompt = """You are a technical support expert for WIX.
Answer questions based on the knowledge base documents provided.
Be concise and helpful."""

answer, citations = await client.ask(
    "How do I create a blog?",
    system_prompt=custom_prompt
)

Filter to specific documents:

answer, citations = await client.ask(
    "What are the main findings?",
    filter="uri LIKE '%paper%'"
)

The QA agent searches your documents for relevant information and uses the configured LLM to generate an answer. The method returns a tuple of (answer_text, list[Citation]). Citations include page numbers, section headings, and document references.

The QA provider and model are configured in haiku.rag.yaml or can be passed directly to the client (see Configuration).

See also: Agents for details on the QA agent and the multi‑agent research workflow.

Analysis

Answer complex analytical questions via code execution:

# Aggregation across documents
result = await client.analyze("Which quarter had the highest revenue?")
print(result.answer)    # The answer
print(result.program)   # The final consolidated program

# Computation within a document set
result = await client.analyze(
    "What is the average deal size mentioned in these contracts?",
    filter="uri LIKE '%contracts%'"
)

# Multi-document comparison
result = await client.analyze(
    "What changed between these two versions of the policy?",
    documents=["Policy v1.0", "Policy v2.0"]
)

The analysis agent writes and executes Python code in a sandboxed environment to solve problems that traditional RAG struggles with: aggregation, computation, and multi-document analysis.

See Analysis Agent for details on capabilities and configuration.

Building Custom Agents

haiku.rag provides a RAG skill built on haiku.skills that bundles all capabilities into a composable agent:

from pydantic_ai import Agent
from haiku.rag.skills.rag import create_skill
from haiku.skills.agent import SkillToolset
from haiku.skills.prompts import build_system_prompt

skill = create_skill(db_path=db_path, config=config)
toolset = SkillToolset(skills=[skill])

agent = Agent(
    "openai:gpt-4o",
    instructions=build_system_prompt(toolset.skill_catalog),
    toolsets=[toolset],
)

result = await agent.run("What are the main findings?")

See Toolsets for the full API reference.