Changelog

Unreleased

0.70.0 - 2026-07-25

Added

HaikuRAG.ask and HaikuRAG.analyze accept images: Sequence[bytes], attached to the question as model input; requires vision: true on the driving model.
haiku-rag ask and haiku-rag analyze accept --image PATH (repeatable).
MCP ask_question and analyze tools accept images_base64.
Chat TUI: Ctrl+I opens an image picker; attached images insert [Image #N] tokens in a multi-line prompt and are sent to the model with the message.

Fixed

Chat and inspector TUIs report the actual database-open error instead of an AttributeError from teardown.

0.69.0 - 2026-07-24

Added

Native deferred Pydantic AI RAGCapability and AnalysisCapability implementations under haiku.rag.capabilities, with namespaced host state and lazy per-run database and sandbox resources.
Prior-turn RAG and analysis tool results are compacted before model requests while current-turn evidence remains intact.
Per-question capability request limits force a final answer from gathered evidence by removing only the exhausted capability's tools; unrelated agent and capability tools remain available.

Changed

Require pydantic-ai-slim>=2.11,<3; the vertexai optional extra now installs Pydantic AI's google extra.
The chat TUI consumes native Pydantic AI stream events. The web example uses the standard AGUIAdapter and emits one final state snapshot instead of forwarding sub-agent activity and per-tool state events.
Chat capability selection is now haiku-rag chat --capability/-c {rag,analysis}. Migrate from --skill/-s.
Evaluation targets are now rag-capability and analysis-capability, and the model override is --capability-model. Migrate from rag-skill, analysis-skill, and --skill-model.

Removed

The haiku.skills dependency, haiku.rag.skills modules, Python entry-point discovery, and sub-agent execution layer. Migrate create_skill(...) plus SkillToolset usage to haiku.rag.capabilities.*.create_capability(...) passed through Agent(capabilities=[...]).
The haiku-rag create-skill package generator. Compose native capabilities directly and package application-specific instructions and data in the consuming project.
Legacy sub-agent ActivitySnapshotEvent plumbing and per-tool StateDeltaEvent generation.

0.68.0 - 2026-07-24

Added

reranking.multimodal config flag: picture chunks are sent to a vllm reranker as image documents.

Fixed

list, get, and visualize no longer require an embeddings config matching the database.

0.67.3 - 2026-07-23

Changed

Security dependency updates: pillow 12.3.0, mcp 1.28.1, pydantic-ai 1.102.0, torch 2.13.0, transformers 5.5.0, soupsieve 2.9.1, joserfc 1.7.4, pyasn1 0.6.4, setuptools 83.0.0; frontend next 16.2.11, sharp 0.35.3, fast-uri 3.1.4, dompurify 3.4.12, body-parser 1.20.6.

0.67.2 - 2026-07-23

Added

SearchResult.document_meta and Citation.document_meta carry the parent document's metadata.

0.67.1 - 2026-07-22

Added

hotpotqa evaluation dataset.

Changed

cite skill tool names unresolvable chunk ids in its response when at least one id resolves.

Fixed

Document deletion no longer raises ConfigMismatchError on embedding config drift.

0.67.0 - 2026-07-16

Added

Database tags: haiku-rag tag create/list/delete/restore, tags shown in history. tag restore creates a before-restore-* safety tag before changing live state. Vacuum retains versions back to the oldest tag.

Changed

lancedb bumped to 0.34.0.

Removed

--before global flag and the before constructor arguments on HaikuRAG, Store, HaikuRAGApp, ChatApp/run_chat, and InspectorApp/run_inspector. There is no read-only replacement; create tags prospectively before important changes and use tag restore during a maintenance window.

Fixed

docling-local text conversion no longer misroutes markdown/HTML content whose first bytes collide with a binary magic signature (e.g. BM, ID3) to an image or audio backend.

0.66.0 - 2026-07-14

Changed

Unknown reranking.model.provider raises ValueError instead of silently disabling reranking.
search.max_context_chars default lowered from 10000 to 5000.

Removed

mxbai reranking provider and extra; the transformers<5.0.0 cap goes with it. Migrate to provider: cross-encoder with the same model name (mixedbread-ai/mxbai-rerank-base-v2), installed via the cross-encoder extra.

Fixed

Context expansion no longer drops a retrieved result from a merged group when clipping to search.max_context_chars; groups whose clip would evict a constituent's evidence are returned as separate expanded results.
Context expansion now fills missing page metadata from input search results when the referenced items survive in the expanded result.

0.65.1 - 2026-07-10

Added

debug-evals and debug-ingestion Claude skills (.claude/skills/) with canned Logfire queries for eval-run and ingestion debugging.

Changed

Logfire spans carry service.version and a per-process service.name (haiku-ingester, haiku-rag, haiku-rag-app); OTEL_SERVICE_NAME / LOGFIRE_SERVICE_NAME override the default.
Each docling-serve request emits a docling_serve.request span carrying the instance url and attempt.
The ingester worker circuit breaker opening emits a ingester.worker breaker opened Logfire event with source_id, threshold, cooldown_s.

0.65.0 - 2026-07-09

Added

SearchResult.chunk_ids and Citation.chunk_ids carry the chunk ids merged into an expanded result.
Citation.doc_item_refs carries the items the model saw; visualize_chunk accepts a refs argument and resolves bounding boxes from them, so visualizations match the cited content instead of re-expanding.
Chunk visualizations draw matched content in a stronger highlight than surrounding context.
haiku-rag visualize --no-expand highlights only the chunk itself, without its expanded context (visualize_chunk gains an expand argument).

Changed

Duplicate images within a document produce a single picture chunk.
Pictures smaller than processing.min_picture_size pixels on the smaller side (default 64) no longer become picture chunks; 0 disables the filter.

Fixed

vLLM embedding and vLLM/Jina reranking reuse one HTTP client across requests instead of opening one per request.
Picture and table search hits expand within their section, never across section boundaries.
A merged search result anchors its chunk_id on the highest-scoring constituent chunk instead of the one earliest in the document.
A clipped citation's page_numbers and doc_item_refs reflect only the content that survived the budget, not the full expanded range.

0.64.0 - 2026-07-08

Added

update_document accepts a uri argument to change a document's URI.
docling-serve requests fail over to another instance on transport/5xx errors and skip instances whose circuit breaker is open; tune via providers.docling_serve.max_attempts and providers.docling_serve.circuit_breaker.

Fixed

Concurrent ingestion of the same URI no longer creates duplicate documents; the URI is re-checked under the write lock and a colliding create becomes an update.
Document filters can reference id (list_documents/count_documents/search/analyze filter=); the document_meta identity column is renamed document_id → id with a migration.

0.63.2 - 2026-07-03

Changed

Custom rerankers override RerankerBase._rerank instead of rerank; the base rerank handles the empty-input short-circuit.

Fixed

A search result whose matched refs include a figure's caption now attaches that figure's picture bytes for vision-capable models, resolved through the caption's adjacent picture item.

0.63.1 - 2026-06-29

Changed

Require haiku.skills>=0.18.0.

Added

orb_multimodal_nemotron pre-built evaluation database (nvidia/llama-nemotron-embed-vl-1b-v2 embedder).
t2_finqa pre-built evaluation database (T²-RAGBench FinQA).
Reference configs under evaluations/configs/ for the pre-built evaluation databases (wix, orb_text, orb_multimodal, orb_multimodal_nemotron, t2_finqa).

0.63.0 - 2026-06-28

Added

doctor shows a spinner naming the check currently in progress while it runs.

Changed

doctor near-duplicate detection compares whole-document embedding centroids instead of per-chunk overlap, and no longer flags a small document contained in a larger one. The per-chunk check could take hours and tens of GB of memory on corpora with large documents. The centroid check is independent of document size and reduces each document to a centroid during the vector scan without a second copy of the matrix.
Breaking: the doctor.duplicates keys containment_threshold, candidate_threshold, and twin_similarity are replaced by a single similarity_threshold (default 0.97). min_chunks is unchanged.
doctor --duplicates-out YAML reports similarity per document instead of contained_fraction.

0.62.1 - 2026-06-27

Fixed

Context expansion now hard-caps each expanded search result at search.max_context_chars, anchored on the matched chunk. A single oversized document_items row (e.g. a spreadsheet converted to one table) no longer pushes a result far past the budget and overflows the model context window.

0.62.0 - 2026-06-26

Added

doctor reports groups of near-duplicate documents (revisions sharing most of their chunks), flagging the largest member as the likely one to keep; --duplicates-out PATH writes the groups to a YAML file. Tuned via the doctor.duplicates config block (containment_threshold, candidate_threshold, twin_similarity, min_chunks).

Security

Bumped dependencies in uv.lock to patched versions for known advisories: aiohttp 3.14.1, cryptography 49.0.0, idna 3.18, langchain-core 1.4.8, langchain-text-splitters 1.1.2, langsmith 0.9.1, lxml 6.1.1, pillow 12.2.0, pydantic-settings 2.14.2, pyjwt 2.13.0, pytest 9.1.1, python-multipart 0.0.32, requests 2.34.2, starlette 1.3.1, urllib3 2.7.0, vcrpy 8.2.1.
app/frontend: bumped next to 16.2.9 and @copilotkit/* to 1.61.1 (clearing their transitive advisories) and added openai (now a peer dependency of @copilotkit/runtime).

Changed

Ingester reaping is lease-based: a worker renews last_heartbeat_at on its in-flight jobs every heartbeat_interval_s, and the reaper reclaims a claim only once its lease is older than lease_ttl_s. A job slower than the timeout is no longer reaped and reprocessed while still running. Adds queue schema v2 (last_heartbeat_at on jobs); existing queue databases migrate in place on open.
Breaking: ingester.workers.claim_timeout_s removed; set lease_ttl_s (default 120) and heartbeat_interval_s (default 30) instead. WorkerConfig rejects unknown keys.
A bare ${VAR} in YAML config now raises MissingEnvVarError when the variable is set but empty, matching the unset case. Use ${VAR:-default} to allow an empty/absent value.

Fixed

A failed FTS index build is logged at WARNING instead of DEBUG, so silent full-text search degradation is visible.
Ingester worker ids are now globally unique ({pid}-{uuid}-{n}); the claimed_by guards on job completion/reschedule/release distinguish workers across processes sharing one Postgres queue.

0.61.2 - 2026-06-24

Fixed

fastmcp is now a direct dependency instead of being pulled via the pydantic-ai-slim[fastmcp] extra, which pydantic-ai-slim 2.0 dropped. Clean installs resolving that release got only fastmcp-slim (client), so haiku-rag mcp failed with ImportError: FastMCP server support is not installed.

0.61.1 - 2026-06-24

Fixed

zstd compression uses a fresh zstandard compressor/decompressor per call instead of shared module-level singletons, fixing process segfaults when ingester workers compress documents concurrently (Python < 3.14).
Ingestion compresses the DoclingDocument structure directly from the in-memory dict, removing one full-size serialized copy from per-document peak memory.
Document item extraction builds one MarkdownDocSerializer per document (reused across tables) and derives description-less picture text from captions, instead of constructing a serializer per picture and table.

0.61.0 - 2026-06-23

Added

haiku-rag doctor checks a database for consistency (orphaned chunks/items, chunk-less documents classified by content and embedder modality, dangling doc_item_refs, vector-dimension mismatch, unembedded chunks, missing picture data, settings/embedding drift, pending migrations, vector-index coverage, provider API keys) and probes configured provider endpoints (Ollama /api/tags with model presence, docling-serve /health, OpenAI-compatible/vLLM /models); exits 1 when any check fails.
embeddings.model.multimodal (bool, default false) gates image embedding; supports_images derives from it instead of the provider name.
VoyageAI multimodal embedder (provider: voyageai, multimodal: true, e.g. voyage-multimodal-3) embedding text and pictures into a shared vector space.
Cohere multimodal embedder (provider: cohere, multimodal: true, e.g. embed-v4.0) embedding text and pictures into a shared vector space.

Changed

provider: vllm is text-only unless embeddings.model.multimodal: true is set. Existing multimodal vLLM configs must add the flag; multimodal is not part of the stored embedding identity, so changing it raises no drift error — re-ingest or rebuild to add or drop picture chunks.

Fixed

More CPU-bound steps run off the event loop: docling item extraction, local chunking, rebuild picture-description patching, and picture-chunk merging no longer stall concurrent workers on large documents.
docling-serve converter requests include_page_images, so generate_page_images produces page rasters on docling-serve versions that gate them behind that flag.
docling-serve chunker stores chunk bodies via raw_text, so section headings aren't duplicated into chunk content on docling-serve versions where the chunk text field is heading-contextualized.

0.60.0 - 2026-06-22

Added

haiku-ingester run-batch --dry-run writes a YAML manifest of planned upserts/deletes without mutating queue jobs or sync_state; run-batch --manifest <path> replays that frozen changeset without another discovery sweep.
haiku-ingester run-batch and run-batch --manifest show an interactive progress bar with ETA while draining queued jobs.

Fixed

CPU-bound ingest steps now run off the event loop: Docling document serialization, docling-serve zip parsing, split-PDF concatenation, fetched-body temp writes, and filesystem read/hash work no longer stall concurrent ingester workers on large image-bearing documents.

0.59.1 - 2026-06-19

Changed

textual-image moved to base dependencies; ask/analyze/visualize render image citations without the tui extra.

Fixed

Embedded-PDF attachment scanning runs in a worker thread instead of on the event loop; ingesting PDFs with embedded attachments no longer stalls concurrent workers.

0.59.0 - 2026-06-16

Added

Custom ingester sources: a source config with type: plugin names a source factory registered under the haiku.rag.sources entry-point group via its plugin field, with an opaque options mapping the plugin validates itself. Only the referenced plugin is imported.

Changed

Bump docling>=2.102.2,<3.0.0 and docling-core>=2.82.0,<3.0.0; the <3.0.0 cap holds the DoclingDocument schema at 1.10.0.
Relax opencv-python-headless to >=4.6.0.66,<5.0.0.0 (was >=4.13.0.92) to match docling-ibm-models' declared range.
haiku.rag.metadata_providers callables take a third argument, the fetched FetchResult: __call__(source_id, uri, result). The provider runs after fetch instead of before; on revision-unchanged sweeps it is skipped and existing provider metadata is preserved.

Fixed

SQLite ingester queue runs with a multi-connection pool (pool_size=5, max_overflow=5) instead of a single connection. API reads (/stats, /jobs) no longer time out with QueuePool limit of size 1 reached while workers hold the connection.
Permanently-failed ingester documents (revisioned sources) record their revision in sync_state, so discovery no longer re-enqueues them every sweep; re-attempted only when the file's revision changes or via explicit retry/rebuild.
Retrying a dead ingester job (/jobs/{id}/retry, /dlq/{id}/retry) when a live job already exists for the same (source_id, uri) returns the live job instead of failing with a 500 (uq_jobs_live violation).

0.58.0 - 2026-06-15

Added

Ingester metadata providers: a source config's metadata_provider names a callable registered under the haiku.rag.metadata_providers entry-point group; the ingester calls it per document with (source_id, uri) and attaches the returned dict as document metadata. System-derived keys (md5, source_revision, content_type) take precedence on collision.

Changed

Mutable document attributes (uri, title, metadata, created_at, updated_at) moved from the documents table into a new document_meta table (1:1 on document_id); metadata/title/source_revision updates no longer rewrite the docling blobs. Migration v0_58_0 relocates existing data and runs a one-time vacuum to reclaim prior bloat.
Background auto-vacuum is throttled to at most once per 5 minutes; a final vacuum on close collapses any throttled writes. Sustained ingestion no longer triggers back-to-back compaction of the documents table.

Fixed

WebDAV source follows HTTP redirects (PROPFIND and GET); front-ended servers that 301/302 on trailing-slash or virtual-host rewrites no longer error.
All in-process pdfium access (page slicing and embedded-attachment scanning) is serialized under a single shared lock. Concurrent ingester workers no longer corrupt libpdfium's global state, which previously failed valid PDFs with "Data format error".
Embedded PDF attachment extension is derived from the attachment filename, not the parent's synthetic ...#attachment=<name> URI; non-PDF attachments (e.g. .joboptions) are no longer misrouted to docling's PDF backend, and unsupported extensions are skipped.

0.57.0 - 2026-06-11

Added

ingester.api.root_path (and haiku-ingester serve --root-path) serves the control plane under a sub-path for reverse-proxying; forwarded to FastAPI/uvicorn root_path and reflected in the dashboard's <base href>.
YAML config string values support ${VAR} / ${VAR:-default} environment-variable interpolation, expanded at load time. ${VAR} referencing an unset variable raises MissingEnvVarError; $$ is a literal $.

0.56.0 - 2026-06-09

Added

Ingester control plane gains GET /database (LanceDB snapshot: stored version, embeddings, per-table counts/sizes, vector index, pending migrations, package versions — the data haiku-rag info prints) and GET /config (full effective config as YAML, secrets redacted). The dashboard surfaces both as on-demand collapsible Database and Configuration panels.
HaikuRAG.import_documents(imports) batch-imports prepared documents (DocumentImport), writing the documents, chunks, and document_items tables once each regardless of batch size. DocumentRepository.create accepts Document | list[Document].

Changed

Ingester control-plane per-request access logs are emitted only when the haiku.rag logger is at DEBUG.

Fixed

Ingester worker circuit breaker is now per-source: a streak of transient failures pauses claims only for the affected source's jobs while healthy sources keep flowing, instead of pausing the whole worker pool. Paused sources are excluded at the claim query.

0.55.1 - 2026-06-08

Changed

Bump haiku.skills>=0.17.2: skill runs use a retry budget of 3 for tool calls and output validation (was 1).
Evaluations disable Logfire scrubbing (scrubbing=False) so financial answers containing words like "authorized" aren't redacted from logged outputs.

Added

analysis.max_executions (default 15): caps execute_code calls per analysis question. Past the cap the tool returns a notice telling the skill to answer from what it has, instead of spiralling into request_limit and returning nothing. The analysis skill sets request_limit to 30 as a backstop.
t2_finqa and t2_tatdqa evaluation datasets (T²-RAGBench subsets, G4KMU/t2-ragbench): financial-report PDFs ingested via docling with uri = context_id and gold retrieval keyed on context_id. QA is scored with a deterministic NumberMatchEvaluator (relative tolerance 0.01) via the new DatasetSpec.qa_evaluator, bypassing the LLM judge.
evaluations run --filter-ids <file>: run QA on just the case ids listed in a file (failure-subset rerun); retrieval is unaffected.
evaluations/scripts/build_t2_submission.py + evaluations.submission: build a T²-RAGBench leaderboard submission JSONL ({id, subset, context_id, prediction}) by joining QA predictions with retrieval rankings by question.

Fixed

Skill tools (search/cite/list_documents/get_document) and the analysis sandbox serialize access to the shared LanceDB connection through one lock, so a turn's concurrently executed tool calls no longer trigger RuntimeError: Already borrowed.

0.55.0 - 2026-06-05

Added

haiku-rag rebuild --set-embedder: adopt the current embedder identity (provider/name) without re-embedding, when the vector dimension is unchanged.

Fixed

Opening a database no longer writes to it: reads no longer rewrite the stored embedding settings or change the stored version, and the version is never downgraded. Embedding provider/name drift (matching vector_dim) warns on read-only opens and raises ConfigMismatchError on writable opens; reconcile with rebuild --set-embedder.
Read CLI verbs (list, get, search, visualize, ask, analyze, inspect, chat, info, history) open the database read-only.
Analysis-skill execute_code no longer crashes with LanceDB Already borrowed when sandboxed code reads the document VFS (content.txt, items.jsonl, toc.json); reads run on one background event loop with a single read-only connection for the sandbox's lifetime.

0.54.0 - 2026-06-04

Added

ingester.queue.dburi: a SQLAlchemy async URL (e.g. postgresql+asyncpg://user:pw@host/db) points the ingester queue at a database server. SQLite remains the default when unset. The Postgres path claims jobs with FOR UPDATE SKIP LOCKED, so multiple ingester processes can share one queue.

0.53.0 - 2026-06-03

Added

ingester.queue.retention_days (default 30): the reaper deletes succeeded/dead jobs whose completed_at is older than the window. null disables pruning.

Fixed

rebuild --embed-only re-embeds picture chunks through the image path instead of overwriting their vectors with a text embedding of the caption.
DELETE jobs re-check the source with head() before deleting and skip when the resource is back, so an atomic-rename save (vim, git checkout) no longer blackholes a live document.

0.52.0 - 2026-06-01

Added

haiku-ingester run-batch: one discover sweep across every configured source, drains the queue, then exits. Orphan deletion requires a persisted ingester.db.
max_file_size (bytes) on every source type rejects oversized files before they are read into memory; they dead-letter as a permanent FileTooLargeError. FS and S3 check the size from metadata before fetching; HTTP and WebDAV rely on a Content-Length response header, so a server that omits it is not enforced.

Changed

Idle workers wake on enqueue via an asyncio.Condition instead of polling on poll_idle_interval_s.
sync_state writes are batched into one transaction per sweep (SyncStateRepo.batch_upsert) instead of one commit per file.
HTTP and WebDAV sources reuse a single httpx.AsyncClient across head/fetch/discover instead of opening one per request.
Periodic poll sweeps start after a random jitter of 0–25% of poll_interval_s.
New partial index idx_jobs_succeeded_completed backs the dashboard rolling-throughput query.

Removed

haiku-ingester run-once. Use run-batch to ingest configured sources, or serve for continuous operation.

Fixed

Revision-less HTTP/S3/WebDAV resources (no ETag or Last-Modified) emit UNCHANGED once known instead of re-ingesting on every sweep.
Deleted, unreadable, and directory paths (FileNotFoundError, PermissionError, IsADirectoryError, NotADirectoryError) classify as permanent failures instead of exhausting retries first.
FS discover() and the watch loop no longer crash when a file is removed between listing and stat().
HTTP discover() catches only httpx.TransportError; other exceptions propagate instead of being emitted as UPSERT.
A sync_state write that fails after a job is marked succeeded is logged instead of crashing the worker.

0.51.0 - 2026-05-29

Added

PDF /EmbeddedFiles attachments are ingested as separate Documents linked to the wrapper through metadata.parent_uri. Child URIs use a #attachment=<percent-encoded-name> fragment on the parent URI. Re-ingest reconciles the child set (add / update / delete) against the wrapper's current attachments; delete_document cascades through parent_uri. Nested chains are bounded at 3 levels. Toggle with processing.extract_pdf_attachments (default true).

Changed

A successful DELETE job auto-prunes dead jobs with the same (source_id, uri). New JobRepo.prune_dead(source_id, uri).
Reranker built once per HaikuRAG client instead of per search() call.
Embedder built once per HaikuRAG client instead of rebuilt per ingest/search operation. embed_chunks now takes it explicitly: embed_chunks(chunks, embedder, config) (was embed_chunks(chunks, config)). New HaikuRAG.embedder property exposes it.
HaikuRAG close runs a final vacuum after draining in-flight background vacuums when storage.auto_vacuum is enabled.

0.50.0 - 2026-05-27

Added

haiku-ingester service for continuous document ingestion. Persistent SQLite job queue, async worker pool with retries and dead-letter queue, FS/HTTP/S3/WebDAV source adapters, per-source and pool-wide circuit breakers, and a FastAPI control plane on 127.0.0.1:8765 exposing /health, /sources, /jobs, /dlq, /stats, and a browser dashboard at /. Configured under ingester: in haiku.rag.yaml; shipped behind the [ingester] extra. See docs/ingester.md.
processing.split_pages (default 0): split PDFs into N-page slices, convert each, merge via DoclingDocument.concatenate(). 0 disables.
Logfire spans for ingestion: ingester.poller.sweep, ingester.poller.watch_event, ingester.job, document.{fetch,convert,chunk,embed,store}, document.convert_slice.

Removed

File monitor (haiku.rag.monitor, MonitorConfig, S3MonitorEntry, AppConfig.monitor, haiku-rag serve --monitor). Migrate monitor.directories to ingester.sources[type=fs] and monitor.s3 to ingester.sources[type=s3].

Changed

haiku-rag serve renamed to haiku-rag mcp. --mcp-port renamed to --port.
document.metadata keys renamed: etag → source_revision, contentType → content_type. v0.50.0 startup migration rewrites existing documents and compacts the documents table at the end (12 GB → 25 GB mid-migration on a 1000-doc PDF corpus, reclaimed back to 12 GB).
providers.docling_serve.base_url now accepts a list. Jobs round-robin across the entries; each job's submit/poll/result pinned to one instance.
DoclingServeClient.submit_and_poll and submit_and_poll_zip no longer wrap httpx errors into ValueError. httpx.ConnectError, HTTPStatusError, TimeoutException, etc. propagate with their type intact.
Logfire spans report instrumentation_scope.name = "haiku.rag" (was the SDK default logfire).
Default RAG skill exposes only search and cite. list_documents and get_document are still available in create_skill_tools and skill_generator's AVAILABLE_TOOLS for custom skills.

0.48.1 - 2026-05-21

Changed

Bump haiku.skills>=0.17.1 and pydantic-ai-slim>=1.100.0 (the last pre-2.0 release). Migrate off two APIs slated for removal in pydantic-ai 2.0: Agent(tool_retries=, output_retries=) → Agent(retries={"tools": …, "output": …}) in the LLM-as-judge evaluator, and Evaluator.evaluation_name class attribute → overriding get_default_evaluation_name() on CitationMRREvaluator / CitationMAPEvaluator.
Drop the item.annotations fallback in _picture_description_text. Docling's PictureItem runs a @model_validator(mode="after") on load that migrates the deprecated annotations field into meta.description, so reading meta.description.text covers both legacy and current blobs. Tests in test_converters.py switched to meta.description.text for the same reason.
Chat TUI now generates a stable per-launch thread_id (rotated on "Clear chat") instead of hardcoding "tui". AGUIAdapter forwards it as the gen_ai.conversation.id OTel attribute, so multi-turn TUI sessions group into one Logfire conversation instead of collapsing every launch into a single bucket.
Documentation generator swapped from mkdocs-material to zensical. Drops mkdocs / mkdocs-material dev deps, replaces mkdocs.yml with zensical.toml, adds overrides/main.html (OG/Twitter share meta) and docs/stylesheets/extra.css. build-docs workflow now runs uv run zensical build and publishes via the GitHub Pages artifact actions instead of mkdocs gh-deploy.

Fixed

mxbai reranker crashing inside the chat TUI with ValueError: bad value(s) in fds_to_keep. tqdm constructs a multiprocessing.RLock on first use, whose resource_tracker spawn picks up sys.stderr.fileno(); Textual's redirected stderr returns -1, failing the fork_exec validation. The reranker now pins tqdm's class lock to a threading.RLock.
migrate failing on DBs upgrading through v0.45.0 with Field 'heading_level' not found in target schema. The v0.45.0 picture-data backfill was building rows from the current DocumentItemRecord Pydantic model — which now carries heading_level / tree_depth added in v0.48.0 — and feeding them to merge_insert against a pre-v0.48.0 schema. Both v0.40.0 and v0.45.0 now build their PyArrow inputs from explicit per-migration column sets so future model additions can't retroactively break them.

0.48.0 - 2026-05-20

Added

heading_level and tree_depth on DocumentItem, populated by extract_items and persisted on document_items. 0.48.0 migration backfills existing rows from each doc's docling structure blob.
toc.json in the analysis sandbox VFS at /documents/{id}/toc.json. Nested tree on HTML/markdown sources, flat sibling list on PDFs. Each node carries {self_ref, level, title, page_numbers, item_range, chunk_ids, children}. chunk_ids aggregates the citable chunks across the section's item_range, so the analysis skill can cite() a section from one VFS read instead of falling back to a corpus-wide search() that risks cross-document hits.
chunk_ids on every items.jsonl row (the citable chunks that contain that item).
picture_refs on sandbox search() result dicts and on Citation (subset of doc_item_refs starting with #/pictures/).
picture_captions: dict[str, str] on SearchResult, populated alongside image_data and rendered as a labelled line in format_for_agent for picture-bearing chunks.
Chat TUI renders picture citations inline via textual_image.widget.Image inside the existing CitationWidget.
CLI citation panel renders picture_refs inline via textual_image.renderable.Image next to the text preview. format_citations_rich is async and takes an optional HaikuRAG client; without one, figures fall back to [Figure: <ref>] markers.
BTree scalar indexes on document_items.{document_id, position, self_ref}. The 0.48.0 migration creates them on existing DBs. Per-doc lookups go from full-table scans (~100–300 ms) to point queries (~3–21 ms) on small/medium corpora.
Per-doc lazy cache for items.jsonl and toc.json in the analysis sandbox. First read fetches; subsequent reads of either file in the same execute_code session hit a serialized cache. One DB fetch per doc per session.
cite tool now accepts chunk_ids that resolve via the chunks table, not only chunk_ids from a prior search() result. Lets the model cite directly from items.jsonl / toc.json rows. The hallucination guard (ModelRetry on chunk_ids that don't exist in the DB) is preserved.
AppConfig.evaluations (EvaluationsConfig) with an optional judge: ModelConfig. Lets the eval CLI pin the LLM-as-judge per-yaml — including a custom base_url for any OpenAI-compatible endpoint (vLLM, LM Studio) without env-var routing.

Removed

Multi-agent research workflow. Removes agents/research/ (graph, state, deps, models, prompts), client.research, the CLI research command, the MCP research_question tool, ResearchConfig, AppConfig.research, PromptsConfig.synthesis, and the corresponding wiring in chat/__init__.py and client/downloads.py. The pydantic-graph workflow was a three-node loop whose differentiators vs the rag skill (planner step, structured ResearchReport, iteration bound) were either redundant with qa.max_searches or sat on pre-cite-tool legacy. Multi-step questions go through client.ask (rag skill).
llm() from the analysis sandbox. Sandbox externals are now search and list_documents only.
list_documents top-level tool from the analysis skill (still available as await list_documents() inside execute_code).
documents= kwarg on client.analyze (and the --document flag on haiku-rag analyze / MCP analyze tool). The pre-loaded documents Python variable inside the sandbox is no longer populated. Use filter= (SQL WHERE clause) to scope analysis to specific documents.
AnalysisResult.program. The per-execution programs are still tracked on AnalysisState.executions (the analysis skill's execute_code tool populates it); consumers that need the executed code should pull it from the skill state instead of the function return value.
--cite flag on haiku-rag ask. Citations always render after the answer now.
system_prompt kwarg on client.ask. No production caller used it; config.prompts.domain_preamble already covers the preamble use case.
Standalone QA agent (haiku.rag.agents.qa.*) and analysis agent module (haiku.rag.agents.analysis.agent, haiku.rag.agents.analysis.prompts). Also drops RawAnalysisResult, CodeExecution, AnalysisDeps, and the dead documents= preload path in Sandbox.
prompts.qa config field.
evaluations optimize subcommand and GEPA prompt-optimization. Drops gepa dep.
--target qa from evaluations run. Default is now rag-skill.
--judge-model flag from evaluations run. Set the judge in config.evaluations.judge instead.
position field on toc.json nodes (redundant with item_range[0]).
position and tree_depth from items.jsonl row serialization. Both fields are still persisted on DocumentItem; they are no longer surfaced to the sandbox.

Changed

haiku.rag.agents.analysis moved to haiku.rag.sandbox. Public surface: from haiku.rag.sandbox import Sandbox, SandboxResult, AnalysisContext, AnalysisResult.
Citation and resolve_citations moved to haiku.rag.store.models.citation (was haiku.rag.agents.research.models), peer to the other output domain models.
search.limit default lowered from 10 to 5.
Search result formatter surfaces picture captions on a labelled line when a chunk's expanded refs include pictures.
Picture bytes attached to search() results are bounded to the pre-expansion chunk's doc_item_refs. Section expansion that sweeps in adjacent picture-bearing items no longer pulls their bytes into the response. Observed ~16× reduction on tool-response payload sizes; eliminates a class of cross-figure contamination in the agent's view.
rag-analysis SKILL.md steers structural lookups ("which section X", "list sections of Y", "summarize section Z") to read /documents/{id}/toc.json first and cite the matching node's chunk_ids directly, instead of calling search(). Empirically validated on the ORB multimodal and Wix corpora: locator and orientation questions now cite the correct document instead of falling back to cross-document search hits.
CLI citation panel compacted: 300-char text preview (no full chunk dump), [N] Title (URI) — pp. — §Section header, dimmed doc: <id> chunk: <id> footer. Green "Citations" label matches the green "Answer:" label.
AnalysisConfig.model defaults to None (was ollama:gpt-oss/no-thinking/temp=0). Resolves via config.analysis.model or config.qa.model.
client.ask and client.analyze route through the rag and rag-analysis skills internally.
Bump docling>=2.93.0 and docling-core>=2.75.0.
Bump pydantic-ai-slim>=1.96.0. Migrate off deprecated APIs: AG-UI imports use pydantic_ai.ui.ag_ui, docs/CLI examples use the explicit openai-chat: model prefix, and Agent(retries=) is split into tool_retries= + output_retries=.
Bump pydantic-monty>=0.0.17. Migrate off deprecated pydantic_monty.run_repl_async(repl, ...) to repl.feed_run_async(...).
Cap transformers<5.0.0 in the mxbai extra: mxbai-rerank>=0.1.6 calls tokenizer.prepare_for_model which transformers 5 removed.
Refresh the rest of the lockfile to latest within current constraints (pydantic, pydantic-ai, rich, ruff, ty, pytest, torch, textual, textual-image, watchfiles, pre-commit, datasets, and transitives).

Fixed

Chat TUI's state-edit screen syntax-highlights JSON instead of falling back to plain text. Adds tree-sitter + tree-sitter-json to the [tui] extra.

Documentation

Rework documentation

0.47.0 - 2026-05-14

Added

cross-encoder reranking provider. Runs any HuggingFace cross-encoder reranker in-process via sentence_transformers.CrossEncoder — no separate server. Useful for BGE (BAAI/bge-reranker-v2-m3), Qwen3-Reranker, MS-MARCO MiniLM, and other CrossEncoder-compatible models when vLLM is not an option. New [cross-encoder] extra pulls sentence-transformers.

Fixed

rebuild --embed-only no longer buffers the entire corpus in memory. The previous implementation accumulated every chunk's id, content, content_fts, metadata, and new embedding vector in a single Python list before flushing. The rebuild now stream-copies non-vector columns into a chunks_rebuild_staging table (1000 rows / page), recreates the chunks table fresh to honour vector-dim changes, then streams from staging one document at a time, embedding in batches of embeddings.batch_size and flushing to the new chunks table every 50 documents.
rebuild --embed-only is now idempotent across crashes. A second table, chunks_rebuild_marker, is written immediately after phase 1 (staging copy) finishes. Its presence flips the next rebuild into resume mode: phase 1 is skipped, the live chunks table is recreated, and phase 2 (re-embed) runs from the existing staging snapshot. Cleanup drops the marker before the staging table, so an interruption between the two drops leaves a markerless staging that the next run discards harmlessly. A staging table without a marker is treated as a partial phase 1 and dropped (the live chunks table is still authoritative). Running a non-embed-only mode (FULL / RECHUNK / DESCRIPTIONS / TITLE_ONLY) after a crashed embed-only correctly discards the staging recovery state. Phase 1's pagination was switched from offset/limit to to_batches, removing the latent offset-drift risk and the O(N²) cost at high offsets.

0.46.0 - 2026-05-13

Added

processing.conversion_options.fetch_remote_images (default true). Controls whether docling fetches images referenced by URL in HTML and Markdown inputs. docling-local only — docling-serve cannot fetch external images via its API regardless of this flag.
s3:// is a first-class document source. create_document_from_source, the CLI haiku-rag add-src, and the MCP add_document_from_url tool all dispatch on the s3 URL scheme. Two-stage change detection keeps metadata["md5"] semantically uniform across all sources: HEAD ETag matching the stored metadata["etag"] short-circuits without GET; if ETag differs but bytes hash to the same MD5 (multipart re-upload, server-side CopyObject, SSE mode change), only the etag refreshes — no re-chunk or re-embed. Closes #357.
S3 / object-storage monitoring. monitor.s3: list[S3MonitorEntry] adds a polling watcher per bucket prefix alongside the existing local-directory watcher. Each entry has its own poll_interval, include_patterns, ignore_patterns, delete_orphans, and storage_options. The same serve --monitor flag enables both. Orphan deletion is per-entry (scoped via uri LIKE 's3://bucket/prefix/%'); other buckets and prefixes are never touched.
[s3] optional extra (obstore>=0.9). Required for s3:// sources and the S3 watcher. Uses obstore — the Python binding to the same Rust object_store crate that LanceDB uses internally — so monitor.s3[*].storage_options accepts the same dict shape as lancedb.storage_options. Empty/missing options fall back to the AWS default credential chain.
scripts/run-integration-tests.sh — wraps docker compose up --wait, pytest -m integration, and tear-down so the SeaweedFS-backed integration suite is a one-liner.
ModelConfig.extra_body. Optional dict forwarded verbatim to ModelSettings.extra_body, the raw pass-through pydantic-ai exposes for openai/ollama/anthropic/groq. Lets configs reach provider-specific keys without haiku.rag modelling them — e.g. extra_body: {chat_template_kwargs: {enable_thinking: false}} to disable Qwen3 thinking on a vLLM endpoint, where the high-level enable_thinking flag is a no-op.
embeddings.batch_size (default 512). Number of text chunks per /v1/embeddings call during ingest. Lower it when your provider caps total tokens per request. Closes #365.

Changed

Chat TUI streams markdown incrementally. Assistant messages now use Textual's MarkdownStream (Markdown.get_stream) and write per-token deltas instead of re-parsing the entire accumulated message on every token. Removes the O(n²) re-parse that visibly stuttered long responses. Bumps textual floor to >=8.2.4 so Markdown.get_stream is reachable via the public API.
Embedding compatibility check only raises on vector_dim mismatch. provider and name drift (legitimate when the same model is served by a different stack, e.g. Ollama → vLLM-via-openai) now logs a one-time warning and updates the stored settings to match the current config. Subsequent opens are silent. Run rebuild --embed-only if you also want to re-embed under the new stack.
processing.pictures enum replaces picture_description.enabled. Three modes: none (skip picture generation entirely — lower RAM, smaller DBs), description (generate images, run VLM, store bytes), image (default — generate images, store bytes, no VLM). Closes #366. Breaking change: rename picture_description.enabled: true → pictures: description, picture_description.enabled: false → pictures: image. The pre-April-30 generate_picture_images flag is also gone; use pictures: none for that opt-out.

Fixed

Picture bytes attached to multimodal tool returns are PNG-verified via PIL.Image.verify(). Bytes that fail verification are dropped.
Conversion options now apply to non-PDF formats. DoclingLocalConverter previously wired its PdfPipelineOptions only to InputFormat.PDF, so user settings (OCR knobs, picture_description.enabled, images_scale, etc.) silently no-op'd for HTML, Markdown, DOCX, PPTX, and IMAGE inputs. The converter now shares a single PdfPipelineOptions instance across PDF, IMAGE, HTML, MD, DOCX, and PPTX FormatOptions. SimplePipeline-backed formats ignore the PDF-specific fields; ConvertPipelineOptions-level enrichments (picture description / classification / chart extraction) now run uniformly. HTML and Markdown additionally receive HTMLBackendOptions / MarkdownBackendOptions gated on fetch_remote_images.
HTML text ingest path picks up converter options. convert_text(format="html"/"md") previously used a bare DoclingDocConverter() with zero format options — the wix corpus ingest path. It now uses the same shared _build_format_options() helper as the file path.
Relative <img> paths resolve during URL ingest. HaikuRAG.convert() and the converter convert_file / convert_text methods now thread a source_uri through to HTMLBackendOptions.source_uri / MarkdownBackendOptions.source_uri. URL ingest uses the originating URL; file ingest uses file://; raw text accepts an optional override. docling-serve accepts the kwarg as a no-op (its API has no equivalent option).
CLI tracebacks no longer dump per-frame locals. The Typer app now passes pretty_exceptions_show_locals=False, so exceptions involving a DoclingDocument (or any large object) print readable rich tracebacks instead of pages of inline base64 image URIs. Set _TYPER_STANDARD_TRACEBACK=1 for plain Python tracebacks.
Batch ingest no longer hits HF Hub's 429 rate limit. The chunking tokenizer is now loaded once per process via @functools.cache instead of once per chunker instance.

Documentation

New "External image fetching" subsection in docs/configuration/processing.md documenting fetch_remote_images, the SSRF / size / timeout guards inherited from docling, and a per-format table of which conversion options actually apply (PDF, IMAGE, HTML, MD, DOCX/PPTX, others).
New "HTML Image Fetching" section in docs/remote-processing.md calling out that docling-serve cannot fetch external <img> URLs and recommending docling-local for HTML ingest when picture bytes matter.
New "S3 / Object Storage Monitoring" section in docs/server.md and docs/configuration/processing.md covering the [s3] extra, polling cadence, ETag semantics, credentials, and CLI usage.
New "Deployment Pattern: One Writer, Many Readers" subsection in docs/configuration/storage.md documenting the recommended IAM split (one ingestion process + N read-only consumers).

0.45.0 - 2026-05-08

Added

Vision capabilities. Picture-aware ingestion, vision QA, multimodal embeddings, and image-as-query search.
Picture bytes always stored at ingest in a new document_items.picture_data column (large_binary), addressable by (document_id, self_ref). Bulk read paths project metadata-only so bytes never leak into context expansion or analysis-sandbox builds. The 0.45.0 migration adds the column on existing DBs and backfills it from each doc's docling blob; URIs are then stripped from the blob so bytes live in one place.
VLM picture descriptions at ingest via processing.conversion_options.picture_description.enabled (default false). When enabled, descriptions are woven into chunk text. The earlier generate_picture_images flag is dropped with a one-time warning. haiku-rag rebuild --descriptions runs the VLM over stored bytes after the fact, idempotently — skipping the docling parse entirely.
Multimodal embedder (provider="vllm") for cross-modal retrieval. Talks HTTP to a vLLM /v1/embeddings endpoint (input array for text, messages superset with image_url for images). Tested with Qwen/Qwen3-VL-Embedding-8B and jinaai/jina-embeddings-v4. No new Python ML dependencies. Under multimodal embedders, ingest emits one synthetic picture chunk per PictureItem, sharing the chunks table with text.
Image-as-query search. client.search() accepts str | bytes | PIL.Image.Image. Image queries embed once and run vector-only against the chunks table. New CLI flag haiku-rag search --image PATH and new MCP tool search_documents_by_image(image_base64, ...) (registered only when the embedder supports images).
Vision QA via qa.model.vision: bool flag on ModelConfig (default false). When true, the agent's search tool attaches picture bytes as BinaryContent parts on its ToolReturn. Default is false because providers behave inconsistently when an image is sent to a text-only model (Ollama silently accepts and confabulates; OpenAI returns 400). SearchResult.image_data: dict[str, str] | None carries base64 picture bytes keyed by self_ref; client.search() and MCP search_documents gain include_images: bool = True.
Silent-failure guard for picture descriptions. When picture_description.enabled=true and a conversion returns at least one picture but zero descriptions, log a warning naming the source, picture count, VLM model, and base URL. Surfaces docling-serve's swallowed VLM errors (unreachable host, missing model) before they pollute a long ingest.
Inspector renders attached pictures under qa.model.vision=true in the context modal (c key) so the inspector reflects what the LLM actually receives.

Fixed

rebuild --descriptions no longer destroys docling_pages. The previous implementation called set_docling() after a structure-only docling load, which writes docling_pages=None and clobbered page rasters for every doc with at least one undescribed picture (silently breaking visualize_chunk for the affected docs).
docling-serve picture-image extraction. docling-serve only emits picture bytes under image_export_mode="referenced" (upstream docling-project/docling-serve#576). The converter switches to referenced + target_type="zip" when picture images are requested and rehydrates artifacts/<filename> URIs back into data: URIs.
rebuild --rechunk reuses the stored docling blob instead of re-converting from the markdown export, which dropped every PictureItem on the floor. Documents without a stored docling blob now raise instead of silently falling back.

Changed

Lazy document hydration during rebuild. Each mode loop now fetches one full record at a time instead of eagerly loading all docs with their multi-MB blobs. Drops startup memory from ~15 GB to ~one document on a 1000-doc database.

0.44.0 - 2026-04-29

Added

Skill-based QA evaluation via evaluations run --target {qa,rag-skill,analysis-skill}. Benchmark the RAG and analysis skills end-to-end alongside the existing QA agent path, against the same datasets and judge. --skill-model "provider:name" overrides the skill model independently from the judge.
Citation retrieval as a second eval metric. CitationMRREvaluator and CitationMAPEvaluator score the URIs the skill registered via the cite tool against each dataset's gold expected_uris, alongside the existing LLMJudge. Console output gains a "Citation Retrieval" summary (mean score, cite rate, mean citations per case). Zero extra skill runs — cited URIs are surfaced via pydantic_evals.set_eval_attribute.
Bumps haiku.skills to >=0.16.0 for the public run_skill API and Skill.request_limit.

Changed

Pinned eval judge defaults to ollama:qwen3.6. Previously --judge-model defaulted to config.qa.model, so changing the QA or skill model also changed the judge — destabilizing cross-run comparisons and re-introducing self-judging whenever the answerer matched. A 2×2 calibration vs Claude Opus 4.7 (gpt-oss / qwen3.6 as both answerer and judge) showed qwen3.6 had κ ≥ 0.66 on both same- and cross-family answerers (vs 0.39–0.55 for gpt-oss) with no detectable self-preference bias. Pass --judge-model provider:name to override.
Tightened cite framing in the RAG skill's SKILL.md. cite is now a precondition for the final answer: the model identifies supporting chunk IDs and calls cite before writing the response. The "MUST cite before answering" requirement carries an explicit refusal carve-out so the model does not cite irrelevant chunks when knowledge is missing. On the wix benchmark this lifted cite rate from 32% → 96%, mean cited_map from 0.15 → 0.48, and cut the "correct answer with no citation" pattern from 52% of cases to 1%, with QA accuracy holding at ~78%.
Removed dataset-specific eval system prompts. WIX_SUPPORT_PROMPT and ORB_SYSTEM_PROMPT duplicated guidance already in the shipped QA_SYSTEM_PROMPT and SKILL.md, and ORB's referenced the obsolete search_documents tool name. The eval-side machinery for injecting them (DatasetSpec.system_prompt, resolve_system_prompt()) is removed. config.prompts.qa remains as the user-facing override knob.

0.43.1 - 2026-04-25

Fixed

Relative db_path no longer trips the LanceDB cloud-URI sanitizer. The 0.43 migration to lancedb.connect_async started routing the path through an async URI sanitizer that treats anything not clearly an absolute local path as a possible cloud URI, raising ValueError: An api_key is required when connecting to LanceDb Cloud on invocations like haiku-rag info --db db/rag.lancedb. The path is now made absolute before being handed to LanceDB.

0.43.0 - 2026-04-24

Changed

Native async LanceDB: all table I/O now uses LanceDB's async API (connect_async, AsyncConnection, AsyncTable). Previously, repository methods were declared async def but called blocking sync LanceDB under the hood, stalling the event loop on every read/write. No change to the documented async with HaikuRAG(...) as client: usage pattern.
BREAKING (internal): HaikuRAG must be used via async with. Store initialization now happens in __aenter__; constructing HaikuRAG(...) and calling methods directly without entering the context manager no longer works.
BREAKING (internal): download_models is no longer a method on HaikuRAG. It's now a module-level function: from haiku.rag.client.downloads import download_models; async for progress in download_models(config): .... The CLI and in-repo consumers are updated.
Concurrency: background vacuum tracked as a task on the client. __aexit__ and rebuild_database now await it explicitly, preventing CreateIndex transaction was preempted commit conflicts when destructive operations follow a create_document that scheduled a background vacuum.

Fixed

Chat TUI now renders citations again. After the 0.42.1 flattening of skill state citations to list[str], the TUI still indexed citations[-1] and iterated the resulting chunk-id string character-by-character, so no citations resolved through citation_index and the citation panel stayed empty. Fixed by iterating state.citations directly.
search(..., filter=...) no longer silently under-returns. The filter path used to materialize LanceDB's top-N window, filter to matching document_ids in pandas, and head(limit). When matching chunks lived outside that top-N window (selective filters, broad queries), the caller got fewer than limit results even though plenty of matching chunks existed in the index. The document filter is now pushed down into the chunk query as document_id IN (...) so .limit(limit) applies to matching chunks directly. Behavior change: searches that previously under-returned will start returning the requested count.

0.42.1 - 2026-04-22

Changed

BREAKING: Skill state citations is now list[str] instead of list[list[str]]. With per-invocation state scoping (0.42), the outer list no longer tracked turn boundaries — it only grouped chunk ids per cite call within a single invocation, which has no downstream meaning. The field is now a flat, deduplicated list of chunk ids cited during the current invocation. Clients resolve each id through citation_index as before. Applies to both RAGState and AnalysisState.

0.42.0 - 2026-04-22

Fixed

create_document, update_document, and rebuild (RECHUNK / full fallback) no longer misread URL-prefixed text as a URL to fetch. These paths passed known-text content through HaikuRAG.convert(), which dispatches on urlparse(source).scheme; text whose first line was https://... (common for clipped web pages and notes) got handed to httpx.get and crashed with httpx.InvalidURL on embedded whitespace. Fixed by calling converter.convert_text(...) directly at those sites; convert() itself is unchanged for create_document_from_source.

Changed

Skills share a single HaikuRAG client per invocation via the new haiku.skills>=0.15.0 lifespan hook. The skill's sub-agent opens one read-only client on entry, all tool calls reuse it, and it closes on exit — replacing the old pattern of open/close around every search / list_documents / get_document call.
max_searches tracked on RAGRunDeps.search_count instead of a module-level ctx.run_id-keyed dict. Eliminates a memory leak in long-running processes where old run ids were never evicted.
Analysis sandbox persists variables across execute_code calls within one invocation. Re-enables the incremental-exploration workflow (search in one call, process results in the next). Each new skill invocation constructs a fresh Sandbox via the analysis lifespan, so there is no cross-invocation leak.
Skill state is scoped to the current invocation. Lifespans now clear citations, searches, and (for analysis) executions at the start of each invocation, so state deltas sent to the AG-UI client reflect only the in-progress turn. citation_index is preserved across invocations so past-turn citation chunk ids remain resolvable, and document_filter is preserved as session-level config.

0.41.0 - 2026-04-20

Added

Document virtual filesystem in analysis sandbox: Documents mounted at /documents/{id}/ with metadata.json (eager), content.txt (lazy), and items.jsonl (lazy). Standard Python pathlib.Path for browsing and reading document content and structure.
execute_code skill tool: Direct code execution in the sandbox, surfaced as individual AG-UI events in the chat TUI. Items VFS uses a lazy bulk cache (~1s for 1000 documents vs 60s+ per-document queries).
cite skill tool: Explicit citation registration with per-turn tracking via citation_index and citations fields in state
--skill flag for chat TUI: haiku-rag chat -s rag -s analysis to enable specific skills
--model overrides all agents: Chat, QA, research, and analysis agents all use the specified model
Collapsible program display in chat TUI: Analysis code execution results shown as expandable code blocks

Changed

BREAKING: Flatten skill architecture: Skill sub-agents now call search, execute_code, cite, list_documents, get_document directly — every tool call surfaces as an AG-UI event. Removes the 3rd agent layer where ask/analyze/research spawned inner agents whose tool calls were invisible.
BREAKING: Rename RLM agent to analysis agent throughout:
agents/rlm/ → agents/analysis/, all classes renamed (RLMResult → AnalysisResult, etc.)
client.rlm() → client.analyze()
CLI: haiku-rag rlm → haiku-rag analyze
MCP: rlm_question → analyze
Config: rlm: → analysis: in YAML, RLMConfig → AnalysisConfig
Skill entrypoint: rag-rlm → rag-analysis
Analysis sandbox search() returns expanded results with doc_item_refs and labels for cross-referencing with items.jsonl
list_documents skill tool takes no parameters — returns all documents
Per-turn citation tracking: citation_index: dict[str, Citation] (deduplicated) + citations: list[list[str]] (per-turn chunk IDs) replaces flat citation list
Search rate limiting: Skill search tool enforces config.qa.max_searches
Context expansion respects section boundaries: Sections within the char budget are returned whole regardless of item count. Too-large sections expand bounded by section edges. Adjacent sections no longer merge — only overlapping ranges do.
Visualization shows full expanded section: visualize_chunk expands context before resolving bounding boxes, so all pages the section spans get highlighted.

Removed

ask skill tool: Replaced by direct search + cite — the skill sub-agent searches and answers directly
analyze skill tool: Replaced by direct execute_code + search + cite
research skill tool: Removed from skill layer (still available via CLI haiku-rag research and MCP)
get_document(), get_docling_document(): Removed from analysis sandbox — replaced by VFS
get_chunk(): Removed from analysis sandbox — search results include expanded context
create_analysis_toolset(): Removed unused tools/analysis.py module
qa_history, reports from skill state: Conversational context handled by the outer chat agent
combine_filters, build_document_filter: Removed from public API
max_context_items: Removed from SearchConfig — max_context_chars is the sole expansion constraint
QAHistoryEntry, tools/qa.py: Removed unused QA history model and relevance threshold

0.40.1 - 2026-04-17

Fixed

haiku-rag info on pre-migration databases: info no longer fails with a misleading Cannot create tables in read-only mode error when a required table added by a later version (e.g. document_items in 0.40.0) is absent. It now reports stats for the tables that do exist, marks the missing ones as absent, and shows a dedicated section listing any pending migrations with the haiku-rag migrate hint (#346)

0.39.0 - 2026-04-16

Added

Document items table: Pre-extracted document items stored as individual rows with scalar indexes, enabling context expansion via indexed range queries (~2.5ms) instead of full DoclingDocument deserialization (~8.7s for large documents)
Section-bounded context expansion: Expansion is now automatic and structure-aware — stays within section boundaries for structured documents, grows outward for unstructured ones. Noise labels (footnotes, page headers/footers) are filtered. Results without doc_item_refs pass through unexpanded.

Changed

Database migration required: Run haiku-rag migrate to populate document_items table for existing documents
Pin docling-core: Upper bound added (<2.72) to prevent uncontrolled schema changes
max_searches default: Raised from 3 to 5 — faster expansion makes additional searches inexpensive
Improved QA prompt: Stronger instruction to refuse answering from tangentially related content
Improved judge prompt: Asymmetric evaluation — generated answers that are more comprehensive than expected are not penalized

Removed

context_radius config: Replaced by automatic section-bounded expansion. Context expansion no longer requires configuration.
DoclingDocument LRU cache: No longer needed — the document_items table replaces in-memory caching for context expansion
cachetools dependency: No longer used

0.39.0 - 2026-04-09

Added

S3/Object storage support: Connect to LanceDB on S3, GCS, Azure Blob, or HDFS via lancedb.uri and storage_options config. Supports S3-compatible stores with custom endpoints.
Remote skill generation: create-skill now supports remote databases — omit --db and provide --config-file to generate skills that connect to object storage at runtime instead of bundling the database.

Fixed

Skill list_documents ignores document_filter: list_documents tool now respects state.document_filter, consistent with search, ask, and research
Skill analyze ignores document_filter: analyze tool now uses state.document_filter (combined with any explicit filter parameter). Added document_filter field to RLMState

0.38.0 - 2026-04-07

Added

Separate page storage: Page images stored in dedicated docling_pages column — search/expand never loads page data
zstd compression: Switch from gzip to zstd for docling document storage (Python 3.14 stdlib, zstandard package for older versions)
Document.set_docling(): Helper method that handles split compression and version assignment, replacing 11 manual call sites
Document.get_page_images(): Load page images without the document structure, for visualize_chunk
DocumentRepository.get_pages_data(): Load only page data column for a document

Changed

Database migration required: Run haiku-rag migrate to split existing docling blobs into structure + pages and re-compress with zstd

Fixed

Generated skill domain_preamble: Apply config.prompts.domain_preamble to instructions in generated skill packages

0.37.0 - 2026-04-07

Changed

Dependency updates: lancedb 0.30.2, pydantic-ai-slim ≥1.77.0, docling ≥2.84.0, docling-core ≥2.71.0, haiku.skills ≥0.13.0, cachetools ≥7.0.5, pydantic-monty ≥0.0.9, cohere ≥5.21.1, textual ≥8.2.1, ty ≥0.0.28, ruff ≥0.15.9
Search result model: SearchResult now includes order field propagated from chunk order

Fixed

Type checking: Fix 37 new ty 0.0.28 diagnostics with proper None guards, assertions, and specific ignore codes
Search performance: Avoid loading full document blobs (docling_document, content) during search — use column projection to fetch only needed metadata (id, uri, title, metadata)
Context expansion performance: Load only docling columns during expand_context (skip content blob), and only when doc_item_refs exist
Chunk expansion performance: Fetch only chunks in the needed order range during context expansion instead of all chunks for a document
Embedding batching: Batch embedding calls in groups of 512 to avoid request size limits and timeouts with large documents
DoclingDocument validation: Strip page images before validation on the read path — pages are only needed for visualize_chunk and account for ~99% of the JSON size

0.36.3 - 2026-04-01

Fixed

Citation formatting: Replace raw UUIDs ([doc_id:chunk_id]) with human-readable identifiers ([index] title) in format_citations() output, preventing LLMs from hallucinating opaque ID markers in answers
domain_preamble propagation: domain_preamble now flows to skill subagents and the main agent preamble, not just internal agents (QA, research). Fixes ambiguous queries failing when domain context was needed.

Changed

domain_preamble docs: Clarified that domain_preamble is for domain context (subject matter, terminology), not behavioral instructions (tone, response style).

0.36.2 - 2026-03-28

Fixed

Skill extras: Include db_path and config in skill extras for both RAG and RLM skills, enabling post-creation reconfiguration

0.36.1 - 2026-03-27

0.36.0 - 2026-03-26

Added

Chunk visualization for generated skills: visualize_chunk(chunk_id) function exposed in generated skill packages, enabling callers to render visual grounding from chunk IDs in skill state
Configurable generated skills: Generated skill create_skill() now accepts optional db_path and config parameters, enabling post-discovery reconfiguration via skill.reconfigure() (requires haiku.skills >= 0.11.0)

Fixed

Generated skill packages: Include SKILL.md and assets in wheel distributions. Add README to generated packages.
Docling-serve chunker: Detect per-document failure status that was silently returning 0 chunks when the task-level status was "success" but individual documents failed
Docling local chunker: Re-enable repeat_table_header for self-contained table chunks, improving retrieval quality and matching docling-serve behavior

0.35.1 - 2026-03-24

Added

create-skill CLI command: Generate standalone skill packages with embedded LanceDB databases. Generated packages register as haiku.skills entry points.

0.35.0 - 2026-03-24

Added

Configurable judge and reflect models: evaluations run and evaluations optimize accept --judge-model provider:name; optimize also accepts --reflect-model provider:name. Both fall back to config.qa.model when not specified.
parse_model_option: Utility in haiku.rag.utils for parsing provider:name strings into ModelConfig
New format extensions: .tex, .latex, .qmd (Quarto), .rmd (R Markdown) supported in both local and serve converters

Changed

LLMJudge: Custom evaluator now accepts ModelConfig instead of a model name string
Docling upgrade: docling-core ≥2.70.2 (schema 1.10.0), docling ≥2.81.0. Adds field data model support for structured form/KV content, wide table chunking fixes, and rich table cell hang fix
pydantic-ai ≥1.70.0: Bumped minimum version. Removed structured_output_type helper — all supported providers now handle native structured output, so agents pass result types directly

0.34.1 - 2026-03-16

Added

PlantUML support: .puml, .plantuml, and .pu files are now indexed as plantuml code blocks

0.34.0 - 2026-03-13

Added

Activity events: TUI and web frontend now display skill sub-agent tool calls via ActivitySnapshotEvent

Changed

RLM sandbox: Bumped pydantic-monty to 0.0.8. Removed regex_* external functions — the sandbox now has native re and math modules via import. Also adds filter() and getattr() builtins.
Frontend deps: Upgraded CopilotKit to 1.54.0 and @ag-ui/client to 0.0.47

0.33.3 - 2026-03-12

Added

GEPA prompt optimization: evaluations optimize command for automated QA system prompt improvement using evolutionary optimization with LLM-judged scoring. Cases are split 50/50 into train/val sets; GEPA budget is auto-computed from --num-candidates and dataset size.
Tuning docs: Added step 7 (Optimize QA Prompts) to the tuning workflow in docs/tuning.md
Evaluations test coverage: Tests for evaluators (MAP, MRR), config, benchmark helpers, dataset mappers/builders, and optimization

Fixed

Read-only mode table creation: --read-only no longer creates lance tables when pointed at an empty directory. Store._init_tables() now raises ReadOnlyError when tables are missing in read-only mode.

0.33.2 - 2026-03-11

Changed

QA search cap: Replace dead max_iterations/max_concurrency config with max_searches (default: 3). The QA agent now enforces a per-run search limit, reducing average response time from ~30s to ~15s while maintaining accuracy. The limit resets per agent run so toolsets can be safely reused.
Default search limit: Increased from 5 to 10 results per search query for better coverage.

Fixed

QA citations: Strengthened prompt to clarify chunk ID format (complete IDs without brackets). resolve_citations now strips [] from IDs, handling models that copy brackets from search result formatting.

0.33.1 - 2026-03-06

Changed

Default model temperatures: Set task-appropriate temperature defaults — 0.3 for QA, research, and title generation; 0.0 for RLM and picture description. Previously unset (provider defaults, typically 0.7–1.0).
QA thinking enabled by default: enable_thinking now defaults to True for QA agent, improving answer quality with reasoning models.
Default title max_tokens: Set max_tokens=100 for title generation model to keep titles concise
Evaluation judge: Set temperature=0.0 and enable_thinking=True for deterministic, higher-quality judging. Removed unused judge config from retrieval benchmarks.
Test suite cleanup: Removed stale VCR cassettes, dead fixtures, orphaned directories, and redundant tests. Strengthened weak assertions across search, context enhancement, and converter tests. Relocated misplaced SearchResult._get_primary_label test to test_search.py
Parallel test execution: Added pytest-xdist and enabled parallel test runs by default (-n auto), reducing test suite time from ~3.5 min to ~2 min

0.33.0 - 2026-03-04

Added

Module-level skill introspection API: STATE_TYPE, STATE_NAMESPACE, skill_metadata(), instructions(), and state_metadata() on haiku.rag.skills.rag and haiku.rag.skills.rlm — allows introspecting skill configuration without calling create_skill()
Automatic structured output detection: Native JSON schema output is used automatically when the model supports it, with tool-call fallback otherwise. No configuration needed.

Changed

haiku.skills dependency: Bumped to >=0.7.0 for StateMetadata dataclass

0.32.3 - 2026-03-03

Changed

AG-UI skill streaming: Tool calls within skills are now streamed as real-time AG-UI events to the frontend. Requires haiku.skills>=0.6.0

Fixed

Search tool regression: Removed LLM-facing filter parameter from search and list_documents tools. The SQL WHERE clause description confused LLMs, degrading QA accuracy. Document filtering is now handled programmatically via base_filter and state.document_filter

0.32.2 - 2026-02-28

Fixed

Compatibility with haiku.skills 0.5.1: Replaced removed SkillToolset.system_prompt with build_system_prompt(toolset.skill_catalog) across chat TUI, backend app, and examples
Minimum dependency: Bumped haiku.skills requirement to >=0.5.1
Chat model default: Chat TUI and backend app now use the configured QA model instead of hardcoded openai:gpt-4o

0.32.1 - 2026-02-26

Added

Automatic title generation: Documents can now have titles auto-generated during ingestion via processing.auto_title: true. Uses two-tier extraction: structural metadata from DoclingDocument (HTML <title>, h1, section headers) first, with LLM fallback via configurable processing.title_model
generate_title(): Public method on HaikuRAG to generate a title for an existing document on demand
rebuild --title-only: New rebuild mode that generates titles only for untitled documents without re-chunking or re-embedding
add --title: CLI option to set a title when adding text documents

0.32.0 - 2026-02-24

Changed

RLM sandbox: Replaced Docker-based code execution with pydantic-monty, a minimal secure Python interpreter written in Rust. Eliminates Docker as a runtime dependency for RLM with sub-millisecond sandbox startup
RLM sandbox functions: Added get_chunk(chunk_id) for retrieving chunk content and metadata from search results. get_docling_document(document_id) now returns the full document structure as a JSON dict. All sandbox functions now require await
RLMConfig: Removed docker_image and docker_memory_limit fields

Added

RLM sandbox regex functions: regex_findall, regex_sub, regex_search, regex_split for pattern matching without LLM calls
HaikuRAG.get_chunk_by_id(): Public method for chunk lookup by ID

Removed

docker_sandbox.py, runner.py: Docker container plumbing replaced by sandbox.py

0.31.1 - 2026-02-20

Fixed

info and history commands: Open database in read-only mode to prevent write failures on read-only filesystems

0.31.0 - 2026-02-20

Added

RAG skill (haiku.rag.skills.rag): haiku.skills integration with search, list_documents, get_document, ask, and research tools plus managed RAGState
RLM skill (haiku.rag.skills.rlm): haiku.skills integration with analyze tool for computational analysis via code execution
HaikuRAG.research(): Client method for multi-agent research
haiku.skills entry points: rag = "haiku.rag.skills.rag:create_skill", rag-rlm = "haiku.rag.skills.rlm:create_skill"

Changed

Chat TUI: Rebuilt on RAG skill + haiku.skills SkillToolset
Web app backend: Rebuilt on RAG skill + AGUIAdapter
Toolsets simplified: Removed ToolContext, SessionState, AgentDeps, Toolkit; kept core FunctionToolset factories
Research graph: Removed session_context and conversational output mode

Removed

agents/chat/: Entire chat agent module (replaced by RAG skill)
--deep flag: Removed from ask CLI (use research command instead)
--context/--context-file: Removed from ask CLI
tools/ state machinery: ToolContext, ToolContextCache, SessionState, AgentDeps, Toolkit, etc.

0.30.2 - 2026-02-19

Fixed

Added cachetools as an explicit dependency (was only available transitively, causing ModuleNotFoundError for some installations)
download-models: Show actionable error message when Ollama is not running instead of cryptic "All connection attempts failed" (#277)

0.30.1 - 2026-02-17

Changed

AG-UI state sync: ask tool now emits StateDeltaEvent (JSON Patch) instead of StateSnapshotEvent, consistent with the search tool

0.30.0 - 2026-02-16

Added

Composable toolsets: New haiku.rag.tools module with reusable FunctionToolset factories that can be mixed into any pydantic-ai agent
create_search_toolset() — hybrid search with context expansion and citation tracking
create_document_toolset() — document listing, retrieval, and summarization
create_qa_toolset() — question answering via research graph with prior answer recall
create_analysis_toolset() — computational analysis via RLM agent (Docker sandbox)
Toolkit and build_toolkit(): High-level factory that bundles toolsets, prompt, and context creation for a given feature set. Reduces agent composition from ~15 lines to ~5. build_chat_toolkit() adds chat-specific defaults (background summarization callback)
ToolContext: Namespace-based state container shared across toolsets. Toolsets register Pydantic models under string namespaces, enabling state accumulation (search results, citations, QA history) across invocations
ToolContextCache: In-memory TTL-based cache for ToolContext instances, keyed by external session/thread ID. Replaces module-level caches for embeddings and summaries
run_qa_core(): Extracted core QA function for direct programmatic use without an agent
Feature-based chat agent: create_chat_agent() accepts a features list to select which toolsets are enabled (search, documents, qa, analysis). System prompt is composed to match
New documentation: docs/tools.md covers all toolsets, ToolContext, state management, filter helpers, and composing custom agents

Changed

Toolset factories decoupled from runtime dependencies: create_search_toolset(), create_qa_toolset(), create_document_toolset(), create_analysis_toolset(), and create_chat_agent() no longer take client or context parameters. Instead, tool functions receive these via pydantic-ai's RunContext.deps. This enables toolset and agent creation at configuration time (cacheable, created once), with only lightweight deps created per-request. Deps must satisfy the RAGDeps protocol (client: HaikuRAG, tool_context: ToolContext | None)
Toolset factory return types narrowed to FunctionToolset[RAGDeps]: All four toolset factories now declare their return type as FunctionToolset[RAGDeps] instead of bare FunctionToolset
create_chat_agent() accepts optional toolkit parameter: Pass a pre-built Toolkit to share toolsets between agent and context creation, avoiding duplicate construction
ChatDeps now includes client: ChatDeps(config=..., client=..., tool_context=...) — the client field was added since it's no longer captured by the agent factory
prepare_chat_context() helper: Extracted from create_chat_agent() for idempotent namespace registration, since the agent factory no longer has access to the context
Chat agent architecture: Rebuilt on composable toolsets instead of monolithic tool definitions. Chat agent is now a thin wrapper around create_search_toolset, create_document_toolset, create_qa_toolset, and create_analysis_toolset
State management simplified: Removed session_id, incoming_session_id, and incoming_session_context from the state layer. ToolContextCache preserves all state (embeddings, summaries, QA history) on cached ToolContext instances, eliminating the need for module-level caches
AG-UI state sync: ask tool now emits StateSnapshotEvent instead of StateDeltaEvent, ensuring background summarization results are reliably delivered to clients
TUI simplified: Chat TUI reads directly from ToolContext namespace states instead of maintaining a separate ChatSessionState and manually syncing via AG-UI state events
AG-UI web app: Uses ToolContextCache to maintain per-thread state across requests
Frontend session management: Persistent chat sessions with localStorage, wired to backend ToolContextCache via CopilotKit threadId
Session manager dropdown: create, switch, delete, and export sessions to markdown
Messages, chat state, and citations restored on session switch
Session title derived from first user message
Inline citation blocks injected after assistant responses via qa_history correlation

Removed

SearchAgent: Replaced by create_search_toolset()
Module-level session caches: _session_cache, cache_session_context, get_cached_session_context, cache_question_embedding, get_cached_embedding — all replaced by cached ToolContext
ChatSessionState from TUI: TUI no longer maintains its own copy of session state

0.29.1 - 2026-02-10

Fixed

Document listing memory usage: list_documents no longer loads full document content and docling blobs by default, preventing out-of-memory errors on large databases. Use include_content=True when content is needed.
Chat session_id not persisting across AG-UI requests: ChatSessionState.session_id now defaults to "" instead of auto-generating a UUID. This ensures the session_id assignment is detected as a state change and included in the StateDeltaEvent delta, allowing clients to persist it across requests.

0.29.0 - 2026-02-06

Added

docling-serve Chunker OCR Options: The docling-serve chunker now respects OCR settings from conversion_options
Passes do_ocr, force_ocr, ocr_engine, and ocr_lang to the chunking API
Allows disabling OCR via config when running docling-serve in read-only containers
RLM Agent (Recursive Language Model): New agent for complex analytical tasks via sandboxed Python code execution
Solves problems traditional RAG can't handle: aggregation, computation, multi-document analysis
Docker-based sandbox with full Python environment (no import restrictions)
Container reuse within a single rlm() call for reduced latency
Available functions: search(), list_documents(), get_document(), get_docling_document(), llm()
Pre-loaded documents support via documents variable
Context filter for scoping searches without LLM control
New client.rlm(question) method on HaikuRAG client
New haiku-rag rlm CLI command
New rlm_question MCP tool
New config options: docker_image, docker_memory_limit
CI: Docker sandbox integration tests run in GitHub Actions

Fixed

CI: Cache HuggingFace tokenizer to prevent flaky test failures when HuggingFace has transient outages

0.28.0 - 2026-01-31

Changed

Iterative Research Planning: Research graph now uses an iterative feedback loop instead of batch question processing
Planner proposes ONE question at a time, sees the answer, then decides whether to continue
Removes gather_context tool — planner proposes questions directly
Simpler flow: plan_next → search_one → loop back until complete → synthesize
Consolidated build_conversational_graph() into build_research_graph(output_mode="conversational")

Removed

Dead config options: Removed vestigial fields from iterative planning refactor
confidence_threshold from ResearchConfig and ResearchState (LLM decides completion via is_complete)
max_sub_questions from QAConfig (iterative flow uses one question at a time)
sub_questions field from ResearchContext (no longer populated)

0.27.2 - 2026-01-29

Added

Deep Ask Evaluations: QA benchmarks can now use the research graph for multi-step reasoning
New --deep flag on evaluations run enables deep ask mode
Uses research graph with max_iterations=2 and confidence_threshold=0.0
Evaluation name automatically suffixed with _deep when enabled
Experiment metadata includes deep_ask field for tracking
Chat Agent Document Awareness Tools: Two new tools for browsing and understanding the knowledge base
list_documents — Returns DocumentListResponse with paginated documents (50 per page), page number, total pages, and total count; respects session document filter
summarize_document — Generate LLM-powered summaries of specific documents
Document Count API: New count_documents(filter) method on HaikuRAG client for efficient document counting
Read-Only Initial Context: Initial context is now locked after the first message, providing consistent session context
Chat TUI: --initial-context CLI option sets background context for the session
Context can be edited via command palette before the first message is sent
After first message, context becomes read-only (view only)
Clearing chat resets context to CLI value and unlocks editing
Web app: Memory panel now serves dual purpose - edit initial context before first message, view session context after
Agent uses initial_context as fallback when session_context is empty

Changed

AG-UI State Delta Updates: Web application now sends StateDeltaEvent (JSON Patch RFC 6902) instead of full StateSnapshotEvent for state updates
Reduces bandwidth when state grows large (e.g., 50 Q&As with citations)
First request still sends full snapshot; subsequent requests send only changes
Backend logging shows incoming/outgoing state events for debugging

Fixed

Chat TUI Session State Sync: TUI now syncs full session state from AG-UI events

0.27.1 - 2026-01-27

Added

Initial Context for Chat Sessions: New initial_context field on ChatSessionState allows external clients to seed sessions with background context
Static context set once at session creation, used as fallback when no cached session context exists
Incorporated into first summarization, after which evolved session_context takes precedence
Eliminates need for clients to import and call internal cache functions (cache_session_context, get_cached_session_context)
session_id now auto-generates a UUID if not provided (previously defaulted to empty string)

Fixed

AG-UI StateSnapshotEvent JSON Serialization: Chat agent tools now use model_dump(mode="json") when creating StateSnapshotEvent
Fixes TypeError: Object of type datetime is not JSON serializable when external clients persist AG-UI state to database JSON columns

0.27.0 - 2026-01-26

Added

Evaluation Database Hosting: Pre-built evaluation databases available on HuggingFace
evaluations download <dataset> downloads pre-built databases from ggozad/haiku-rag-eval-dbs
evaluations upload <dataset> uploads databases to HuggingFace (maintainer only)
Supports all argument to download/upload all datasets at once
Use --force flag to overwrite existing databases
Avoids lengthy database rebuild times for users running benchmarks
Stable Citation Registry: Citation indices now persist across tool calls within a session
Same chunk_id always returns the same citation index (first-occurrence-wins)
New citation_registry: dict[str, int] field on ChatSessionState
New get_or_assign_index(chunk_id) method for stable index assignment
Registry serialized/restored via AG-UI state protocol
Prior Answer Recall: The ask tool automatically checks conversation history before research
Finds semantically similar prior answers using embedding similarity (0.7 cosine threshold)
Relevant prior answers are passed to the research planner as context
Planner can return empty sub_questions when context is sufficient, avoiding redundant searches
Dynamic Session Context: Compressed conversation history for multi-turn chat
New SessionContext model stores summarized conversation state instead of raw Q&A history
Background LLM-based summarization runs after each ask tool call (non-blocking)
Previous summarization tasks are cancelled when new ones start
Research graph receives compact context (~1,000-2,000 tokens) instead of raw qa_history (potentially thousands of tokens)
New session_context field on ChatSessionState synced via AG-UI state protocol
Chat TUI: New context modal (Ctrl+O) to view current session context
Session Document Filter: Restrict all search/ask operations to selected documents
New document_filter field on ChatSessionState stores list of document titles/URIs
Session filter combines with per-tool document_name filter using AND logic
Multi-document selection uses OR logic within the session filter
Filter persists across tool calls and chat clears via AG-UI state protocol
Chat TUI: Access via command palette ("Filter documents" command)
Web Application: Filter button in header shows count of selected documents

Changed

Dependencies: Updated core dependencies
pydantic-ai-slim: 1.44.0 → 1.46.0
lancedb: 0.26.1 → 0.27.0
docling: 2.68.0 → 2.69.1
docling-core: 2.59.0 → 2.60.1
VoyageAI Embeddings: Now uses pydantic-ai-slim's native VoyageAI support instead of custom implementation
Removed haiku.rag.embeddings.voyageai module
The voyageai extra now delegates to pydantic-ai-slim[voyageai]

Removed

Q&A History Functions: Removed standalone conversation history utilities
rank_qa_history_by_similarity() - similarity matching now integrated into ask tool
format_conversation_context() - replaced by SessionContext summarization
Associated embedding cache and helper functions also removed

0.26.9 - 2026-01-22

Fixed

v0.25.0 Migration Failure: Fixed "Table 'documents' already exists" error during migration caused by held table references preventing drop_table() from succeeding. Added recovery logic to restore documents from staging table if a previous migration attempt failed mid-way.

0.26.8 - 2026-01-22

Added

Jina Reranker v3: Added support for Jina reranking with API mode (provider: jina) and local inference (provider: jina-local, requires [jina] extra)
Model Downloads: download-models now pre-downloads HuggingFace models for sentence-transformers, mxbai, and jina-local
Reranker Factory: Removed unreliable id(config)-based caching from get_reranker(); factory now always instantiates fresh

Changed

Agent Search Result Display: Search results now show rank position instead of raw scores
SearchResult.format_for_agent() accepts optional rank and total parameters
Output changes from (score: 0.02) to [rank 1 of 5] when rank is provided
Prevents LLMs from misinterpreting low RRF hybrid search scores as "2% relevant"
QA and Research agents updated to pass rank/total to formatted results
Agent prompts updated to reference rank-based ordering instead of scores

Fixed

Test Cassette Organization: Consolidated all VCR cassettes to tests/cassettes/
Environment Loading: Fixed .env file loading to search from current working directory instead of source file directory (#250) - thanks @tianyicui

0.26.7 - 2026-01-20

Added

OCR Engine Selection: New ocr_engine option in conversion_options to explicitly select OCR backend (#246)
Supported engines: auto (default), easyocr, rapidocr, tesseract, tesserocr, ocrmac
Works with both docling-local and docling-serve converters
Fixes inconsistent OCR engine selection between docling-serve startup and conversion requests

Removed

A2A Example: Removed examples/a2a-server/ A2A protocol server example
Stale Example References: Cleaned up references to removed ag-ui-research example from documentation

Changed

MCP Error Handling: MCP tools now let exceptions propagate naturally; FastMCP converts them to proper MCP error responses
Chunk Contextualization: Consolidated duplicate contextualize logic into Chunk.contextualize_content() method
Type Checker: Replaced pyright with ty, Astral's extremely fast Python type checker
Added explicit Agent[Deps, Output] type annotations to all pydantic-ai agents for better type inference
Removed ~24 unnecessary # type: ignore comments that ty correctly infers
Dependencies: Updated to latest versions
pydantic-ai-slim: 1.39.0 → 1.44.0
docling: 2.67.0 → 2.68.0
pathspec: 0.12.1 → 1.0.3
textual: 7.0.0 → 7.3.0
datasets: 4.4.2 → 4.5.0
ruff: 0.14.11 → 0.14.13
opencv-python-headless: 4.12.0.88 → 4.13.0.90

Fixed

Chat TUI: Fixed crash when logfire is installed but user is not authenticated (#247)

0.26.6 - 2026-01-19

Changed

Explicit Database Migrations: Database migrations are no longer applied automatically on open
Opening a database with pending migrations now raises MigrationRequiredError with a clear message
New haiku-rag migrate command to explicitly apply pending migrations
Version-only updates (no schema changes) are applied silently in writable mode
New skip_migration_check parameter on Store for tools that need to bypass the check
Store.migrate() method returns list of applied migration descriptions

0.26.5 - 2026-01-16

Added

Background Context Support: Pass background context to agents via CLI or Python API
haiku-rag ask --context "..." --context-file path for Q&A with background context
haiku-rag research --context "..." --context-file path for research with background context
haiku-rag chat --context "..." --context-file path for chat sessions with persistent context
ResearchContext(background_context="...") for Python API usage
ChatSessionState(background_context="...") for chat agent sessions
Context is included in agent system prompts and research graph planning
Frontend Background Context: Settings panel in the chat app to configure persistent background context
Context is stored in localStorage and sent with each conversation
Frontend Linting: Added Biome for linting and formatting the frontend codebase

0.26.4 - 2026-01-15

Added

AGUI_STATE_KEY Constant: Exported AGUI_STATE_KEY ("haiku.rag.chat") from haiku.rag.agents.chat for namespaced AG-UI state emission
Enables integrators to use a consistent key when combining haiku.rag with other agents
Backend, TUI, and frontend now use this key for state emission and extraction

0.26.3 - 2026-01-15

Added

Enhanced Database Info: haiku-rag info now displays pydantic-ai version and docling-document schema version
Keyed State Emission for Chat Agent: New state_key parameter in ChatDeps for namespaced AG-UI state snapshots
When set, tools emit {state_key: snapshot} instead of bare state, enabling state merging when multiple agents share state
Default None preserves backwards compatibility (bare state emission)
Page Image Generation Control: New generate_page_images option in ConversionOptions to control PDF page image extraction

Changed

CLI Error Handling: Commands (rebuild, vacuum, create-index, ask, research) now propagate errors with proper exit codes instead of swallowing exceptions

Fixed

Embed-only rebuild with changed vector dimensions: Fixed haiku-rag rebuild --embed-only failing when the configured embedding model has different dimensions than the database
Store now reads stored vector dimension when opening existing databases, allowing chunks to be read regardless of current config
_rebuild_embed_only recreates the chunks table to handle dimension changes
generate_page_images: bool = True - Enable/disable rendered page images (used by visualize_chunk())
Works with both docling-local and docling-serve converters
For docling-serve, maps to image_export_mode API parameter (embedded/placeholder)
Note: generate_picture_images (embedded figures/diagrams) works with local converter but has limited support in docling-serve

0.26.2 - 2026-01-13

Changed

Dependencies: Updated docling dependencies for latest docling-serve compatibility (#229)
docling-core: 2.57.0 → 2.59.0 (supports schema 1.9.0)
docling: 2.65.0 → 2.67.0

0.26.1 - 2026-01-13

Fixed

Docling Schema Version Mismatch: Fixed incompatibility between docling and docling-core causing ValidationError: Doc version 1.9.0 incompatible with SDK schema version 1.8.0 when adding documents (#229)
Root cause: docling-core was reverted to 2.57.0 (schema 1.8.0) for docling-serve compatibility, but docling remained at 2.67.0 (schema 1.9.0)
Fix: Reverted docling from 2.67.0 to 2.65.0 to match docling-core schema version

0.26.0 - 2026-01-13

Added

Conversational RAG Application: Full-stack application (app/) with CopilotKit frontend and pydantic-ai AG-UI backend
Next.js frontend with chat interface, citation display, and visual grounding
Starlette backend using pydantic-ai's native AGUIAdapter for streaming
Docker Compose setup for development (docker-compose.dev.yml) and production
Logfire integration for debugging LLM calls
SSE heartbeat to prevent connection timeouts
Chat Agent (haiku.rag.agents.chat): New conversational RAG agent optimized for multi-turn chat
create_chat_agent() factory function for creating chat agents with AG-UI support
SearchAgent for internal query expansion with deduplication
ChatDeps and ChatSessionState for session management
CitationInfo and QAResponse models for structured responses
Natural language document filtering via build_document_filter()
Configurable search limit per agent
Chat TUI (haiku-rag chat): Terminal-based chat interface using Textual
Single chat window with inline tool calls and expandable citations
Visual grounding (v key) reuses inspector's VisualGroundingModal
Database info (i key) shows document/chunk counts and storage info
Keybindings: q quit, Ctrl+L clear chat, Escape focus input
Q/A History Management: Intelligent conversation history with semantic ranking
FIFO queue with 50 max entries
Embedding cache to avoid re-embedding Q/A pairs
rank_qa_history_by_similarity() returns top-K most relevant history entries
Confidence filtering to exclude low-confidence answers from context
Conversational Research Graph: Simplified single-iteration research graph for chat
build_conversational_graph() optimized for conversational Q&A
Context-aware planning (generates fewer sub-questions when history exists)
ConversationalAnswer output type with direct answer and citations

Changed

BREAKING: Module Reorganization: Consolidated all agent code under haiku.rag.agents
Moved haiku.rag.qa → haiku.rag.agents.qa
Moved haiku.rag.graph.research → haiku.rag.agents.research
Added haiku.rag.agents.chat module with conversational RAG agent
Deleted haiku.rag.graph module (research graph now at haiku.rag.agents.research.graph)

Removed

BREAKING: Custom AG-UI Infrastructure: Removed custom AG-UI event handling in favor of pydantic-ai's native AG-UI support
Deleted haiku.rag.graph.agui module (AGUIEmitter, AGUIConsoleRenderer, stream_graph(), create_agui_server())
Removed --agui flag from serve command
Removed --verbose flags from ask and research commands
Removed --interactive flag from research command
Removed AGUIConfig from configuration
Deleted cli_chat.py interactive chat module
Research graph now uses graph.run() directly instead of stream_graph()
For AG-UI streaming, use pydantic-ai's native AGUIAdapter with ToolReturn and StateSnapshotEvent (see app/backend/ for example)
AG-UI Research Example: Removed examples/ag-ui-research/ (replaced by app/)

0.25.0 - 2026-01-12

Fixed

Large Document Storage Overflow: Fixed "byte array offset overflow" panic when vacuuming/rebuilding databases with many large PDF documents (#225)
Root cause: Arrow's 32-bit string column offsets limited to ~2GB per fragment
Changed docling_document_json (string) to docling_document (bytes) with large_binary Arrow type (64-bit offsets)
Added gzip compression for DoclingDocument JSON (~1.4x compression ratio)
Migration automatically compresses existing documents in batches to avoid memory issues
Breaking: Migration is destructive - all table version history is lost after upgrade

Changed

Dependencies: Updated lancedb 0.26.0 → 0.26.1, docling 2.65.0 → 2.67.0

Removed

Legacy Migrations: Removed obsolete database migration files (v0_9_3.py, v0_10_1.py, v0_19_6.py). These migrations were for versions prior to 0.20.0 and are no longer needed since the current release requires a database rebuild anyway.

0.24.2 - 2026-01-08

Fixed

Base64 Images in Expanded Context: Fixed base64 image data leaking into expanded search results when expand_context() processed PictureItem objects. The issue was PictureItem.export_to_markdown() defaulting to EMBEDDED mode. Now explicitly uses PLACEHOLDER mode to prevent base64 data while still including VLM descriptions and captions.

0.24.1 - 2026-01-08

Fixed

OpenAI Non-Reasoning Models: Fixed reasoning_effort parameter being sent to non-reasoning OpenAI models (gpt-4o, gpt-4o-mini), causing 400 errors. Now correctly detects reasoning models (o1, o3 series) using pydantic-ai's model profile.
Bedrock Non-Reasoning Models: Fixed same issue for OpenAI models on Bedrock.

0.24.0 - 2026-01-07

Added

VLM Picture Description: Describe embedded images using Vision Language Models during document conversion
Images are sent to a VLM for automatic description via OpenAI-compatible API
Descriptions become searchable text, improving RAG retrieval for visual content
Configure via processing.conversion_options.picture_description with enabled, model, timeout, max_tokens
Default prompt customizable via prompts.picture_description
Requires OpenAI-compatible /v1/chat/completions endpoint (Ollama, OpenAI, vLLM, LM Studio)

0.23.2 - 2026-01-05

Fixed

AG-UI Concurrent Step Tracking: Emitter now correctly tracks multiple concurrent steps (#216)

Changed

Dependencies: Updated core and development dependencies

0.23.1 - 2025-12-29

Added

Contextualized FTS Search: Full-text search now includes section headings
New content_fts column stores contextualized content (headings + body text)
FTS index now searches content_fts for better keyword matching on section context
Original content column preserved for display and context expansion
Migration automatically populates content_fts for existing databases
GitHub Actions CI: Test workflow runs pytest, pyright, and ruff on push/PR to main
VCR Cassette Recording: Integration tests use recorded HTTP responses for deterministic CI runs
LLM tests (QA, embeddings, research graph) replay from cassettes without real API calls
docling-serve tests run without Docker container in CI
Uses pytest-recording with custom JSON body serializer

0.23.0 - 2025-12-26

Added

Prompt Customization: Configure agent prompts via prompts config section
domain_preamble: Prepended to all agent prompts for domain context
qa: Full replacement for QA agent prompt
synthesis: Full replacement for research synthesis prompt

Changed

Embeddings: Migrated to pydantic-ai's embeddings module
Uses pydantic-ai v1.39.0+ embeddings with instrumentation and token counting support
Explicit embed_query() and embed_documents() API for query/document distinction
New providers available: Cohere (cohere:), SentenceTransformers (sentence-transformers:)
VoyageAI refactored to extend pydantic-ai's EmbeddingModel base class
Configuration: Added base_url to ModelConfig and EmbeddingModelConfig
Enables custom endpoints for OpenAI-compatible providers (vLLM, LM Studio, etc.)
Model-level base_url takes precedence over provider config

Deprecated

vLLM and LM Studio providers: Use openai provider with base_url instead
provider: vllm → provider: openai with base_url: http://localhost:8000/v1
provider: lm_studio → provider: openai with base_url: http://localhost:1234/v1

Removed

Deleted obsolete embedder implementations: ollama.py, openai.py, vllm.py, lm_studio.py, base.py
Removed VLLMConfig and LMStudioConfig from configuration (use base_url in model config instead)

0.22.0 - 2025-12-19

Added

Read-Only Mode: Global --read-only CLI flag for safe database access without modifications
Blocks all write operations at the Store layer
Skips database upgrades and settings saves on open
Excludes write tools (add_document_*, delete_document) from MCP server
Disables file monitor with warning when --read-only is used with serve --monitor
Time Travel: Query the database as it existed at a previous point in time
Global --before CLI flag accepts datetime strings (ISO 8601 or date-only)
Automatically enables read-only mode when time-traveling
New history command shows version history for database tables
Useful for debugging and auditing
Supported throughout: CLI, Client, App, Inspector

Fixed

File Monitor Path Validation: Monitor now validates directories exist before watching (#204)
Provides clear error message pointing to haiku.rag.yaml configuration
Prevents cryptic FileNotFoundError: No path was found from watchfiles
Docker Documentation: Improved Docker setup instructions
Added volume mount examples for config file and documents directory
Clarified that monitor.directories must use container paths, not host paths

Changed

Dependencies: Updated core dependencies
pydantic-ai-slim: 1.27.0 → 1.36.0 (FileSearchTool, web chat UI, GPT-5.2 support, prompt caching)
lancedb: 0.25.3 → 0.26.0
docling: 2.64.0 → 2.65.0
docling-core: 2.54.0 → 2.57.0

0.21.0 - 2025-12-18

Added

Interactive Research Mode: Human-in-the-loop research using graph-based decision nodes
haiku-rag research --interactive starts conversational CLI chat
Natural language interpretation for user commands (search, modify questions, synthesize)
Chat with assistant before starting research, and during decision points
Review collected answers and pending questions at each decision point
Add, remove, or modify sub-questions through natural conversation
New human_decide graph node emits AG-UI tool calls (TOOL_CALL_START/ARGS/END) for frontend integration
New emit_tool_call_start(), emit_tool_call_args(), emit_tool_call_end() AG-UI event helpers
New AGUIEmitter.emit() method for direct event emission
AG-UI Research Example: Human-in-the-loop research with client-side tool calling
Frontend handles human_decision tool calls via AG-UI TOOL_CALL_* events
Tool results sent directly to backend /v1/research/stream endpoint
Backend queues decisions and continues the research graph
HotpotQA Evaluation: Added HotpotQA dataset adapter for multi-hop QA benchmarks
Extracts unique documents from validation set context paragraphs
Uses MAP for retrieval evaluation (multiple supporting documents per question)
Run with evaluations hotpotqa
Plain Text Format: Added format="plain" for text conversion
Use when content is plain text without markdown/HTML structure
Falls back gracefully when docling cannot detect markdown format in content
Supported in create_document(), convert(), and all converter classes

Changed

AG-UI Events: Replaced custom event classes with ag_ui.core types
Removed haiku.rag.graph.agui.events module
Event factory functions (emit_*) now wrap official ag_ui.core event classes
Chunker Sets Order: Chunkers now set chunk.order directly
Unified Research Graph: Simplified and unified research and deep QA into a single configurable graph
Removed analyze_insights node - graph now flows directly from collect_answers to decide
Simplified EvaluationResult to: is_sufficient, confidence_score, reasoning, new_questions
Simplified ResearchContext - removed insight/gap tracking methods
ask --deep now uses research graph with max_iterations=2, confidence_threshold=0.0
ask --deep output now shows executive summary, key findings, and sources
Added include_plan parameter to build_research_graph() for plan-less execution
Added max_iterations and confidence_threshold overrides to ResearchState.from_config()
Improved Synthesis Prompt: Updated synthesis agent prompt to produce direct answers
Executive summary now directly answers the question instead of describing the report
Added explicit examples of good vs bad output style
Evaluations Vacuum Strategy: populate_db now uses periodic vacuum to prevent disk exhaustion with large datasets
Disables auto_vacuum during population, vacuums every N documents with retention=0
New --vacuum-interval CLI option (default: 100) to control vacuum frequency
Prevents disk space issues when building databases with thousands of documents (e.g., HotpotQA)
Benchmarks Documentation: Restructured benchmarks.md for clarity
Added dedicated Methodology section explaining MRR, MAP, and QA Accuracy metrics
Organized results by dataset with retrieval and QA subsections

Removed

Deep QA Graph: Removed haiku.rag.graph.deep_qa module entirely
Use build_research_graph() with appropriate parameters instead
ask --deep CLI command now uses research graph internally
Insight/Gap Tracking: Removed over-engineered insight and gap tracking from research graph
Removed InsightRecord, GapRecord, InsightAnalysis, InsightStatus, GapSeverity models
Removed format_analysis_for_prompt() helper
Removed INSIGHT_AGENT_PROMPT from prompts

0.20.2 - 2025-12-12

Fixed

LLM Schema Compliance: Improved prompts to prevent LLMs from returning objects instead of plain strings for list[str] fields
All graph prompts now explicitly state that list fields must contain plain strings only
Added missing query and confidence fields to search agent output format documentation
Fixes validation errors with less capable models that ignore JSON schema constraints
AG-UI Frontend Types: Fixed TypeScript interfaces in ag-ui-research example to match backend Python models
EvaluationResult: confidence → confidence_score, should_continue → is_sufficient, gaps_identified → gaps, follow_up_questions → new_questions, added key_insights
ResearchReport: question → title, summary → executive_summary, findings → main_findings, removed insights_used/methodology, added limitations/recommendations/sources_summary
Updated Final Report UI to display new fields (Recommendations, Limitations, Sources)
Citation Formatting: Citations in CLI now render properly with Rich panels
Content is rendered as markdown with proper code block formatting
No longer truncates or flattens newlines in citation content

0.20.1 - 2025-12-11

Added

Search Filter for Graphs: Research and Deep QA graphs now support search_filter parameter to restrict searches to specific documents
Set state.search_filter to a SQL WHERE clause (e.g., "id IN ('doc1', 'doc2')") before running the graph
Enables document-scoped research workflows
CLI: haiku-rag research "question" --filter "uri LIKE '%paper%'"
CLI: haiku-rag ask "question" --filter "title = 'My Doc'"
Python: client.ask(question, filter="...") and agent.answer(question, filter="...")
AG-UI Research Example: Added bidirectional state demonstration with document filter
New /api/documents endpoint to list available documents
Frontend document selector component with search and multi-select
Demonstrates client-to-server state flow via AG-UI protocol
Inspector Info Modal: New i keyboard shortcut opens a modal displaying database information

Changed

Inspector Lazy Loading: Chunks panel now loads chunks in batches of 50 with infinite scroll
Fixes unresponsive UI when viewing documents with large numbers of chunks
New ChunkRepository.get_by_document_id() pagination with limit and offset parameters
New ChunkRepository.count_by_document_id() method

0.20.0 - 2025-12-10

Added

DoclingDocument Storage: Full DoclingDocument JSON is now stored with each document, enabling rich context and visual grounding
Documents store the complete DoclingDocument structure (JSON) and schema version
Chunks store metadata with JSON pointer references (doc_item_refs), semantic labels, section headings, and page numbers
New ChunkMetadata model for structured chunk provenance: doc_item_refs, headings, labels, page_numbers
Document.get_docling_document() method to parse stored DoclingDocument
ChunkMetadata.resolve_doc_items() to resolve JSON pointer refs to actual DocItem objects
ChunkMetadata.resolve_bounding_boxes() for visual grounding with page coordinates
LRU cache (100 documents) for parsed DoclingDocument objects to avoid repeated JSON parsing
Enhanced Search Results: search() and expand_context() now return full provenance information
SearchResult includes page_numbers, headings, labels, and doc_item_refs
QA and research agents use provenance for better citations (page numbers, section headings)
Type-Aware Context Expansion: expand_context() now uses document structure for intelligent expansion
Structural content (tables, code blocks, lists) expands to complete structures regardless of chunking
Text content uses radius-based expansion via text_context_radius setting
max_context_items and max_context_chars settings control expansion limits
SearchResult.format_for_agent() method formats expanded results with metadata for LLM consumption
Visual Grounding: View page images with highlighted bounding boxes for chunks
Inspector modal with keyboard navigation between pages
CLI command: haiku-rag visualize <chunk_id>
Requires textual-image dependency and terminal with image support
Processing Primitives: New methods for custom document processing pipelines
convert() - Convert files, URLs, or text to DoclingDocument
chunk() - Chunk a DoclingDocument into Chunk objects
contextualize() - Prepend section headings to chunk content for embedding
embed_chunks() - Generate embeddings for chunks
New import_document() Method: Import pre-processed documents with custom chunks
Accepts DoclingDocument directly for rich metadata (visual grounding, page numbers)
Use when document conversion, chunking, or embedding were done externally
Chunks without embeddings are automatically embedded
Automatic Chunk Embedding: import_document() and update_document() automatically embed chunks that don't have embeddings
Pass chunks with or without embeddings - missing embeddings are generated
Chunks with pre-computed embeddings are stored as-is
Format Parameter for Text Conversion: New format parameter for convert() and create_document() to specify content type
Supports "md" (default) for markdown and "html" for HTML content
HTML format preserves document structure (headings, lists, sections) in DoclingDocument
Enables proper parsing of HTML content that was previously treated as plain text
Inspector Context Modal: Press c in the inspector to view expanded context for the selected chunk
Auto-Vacuum Configuration: New storage.auto_vacuum setting to control automatic vacuuming behavior
When true (default), vacuum runs automatically after document create/update operations and rebuilds
When false, vacuum only runs via explicit haiku-rag vacuum command
Disabling can help avoid potential crashes in high-concurrency scenarios due to LanceDB race conditions

Changed

BREAKING: create_document() API: Removed chunks parameter
create_document() now always processes content (converts, chunks, embeds)
Use import_document() for pre-processed documents with custom chunks
BREAKING: update_document() API: Unified with update_document_fields()
Old: update_document(document) - pass modified Document object
New: update_document(document_id, content=, metadata=, chunks=, title=, docling_document=)
content and docling_document are mutually exclusive
BREAKING: Chunker Interface: DocumentChunker.chunk() now returns list[Chunk] instead of list[str]
Chunks include structured metadata (doc_item_refs, labels, headings, page_numbers)
Search Config: New settings in search section for search behavior and context expansion
search.limit - Default number of search results (default: 5). Used by CLI, MCP server, and API when no limit specified
search.context_radius - DocItems before/after to include for text content expansion (default: 0)
search.max_context_items - Maximum items in expanded context (default: 10)
search.max_context_chars - Maximum characters in expanded context (default: 10000)
Rebuild Performance: Batched database writes during rebuild command reduce LanceDB versions by ~98%
All rebuild modes (FULL, RECHUNK, EMBED_ONLY) now batch writes across documents
Eliminates redundant per-document chunk deletions and vacuum calls
Significantly reduces storage overhead and improves rebuild speed for large databases
Embedding Architecture: Moved embedding generation from ChunkRepository to client layer
Repository is now a pure persistence layer
Client handles embedding via _ensure_chunks_embedded()
Chunk Text Storage: Chunks store raw text; headings prepended only at embedding time
Stored chunk content stays clean without duplicate heading prefixes
Local and serve chunkers now produce identical output
Citation Models: Introduced RawSearchAnswer for LLM output, SearchAnswer with resolved citations
Page Image Generation: Always enabled for local docling converter (required for visual grounding)
Download Models Progress: haiku-rag download-models now shows real-time progress with Rich progress bars for Ollama model downloads

Removed

BREAKING: markdown_preprocessor Config Option: Use processing primitives (convert(), chunk(), embed_chunks()) for custom pipelines
update_document_fields(): Merged into update_document()

Migration

This release requires a database rebuild to populate the new DoclingDocument fields:

haiku-rag rebuild

Existing documents without DoclingDocument data will work but won't have provenance information.

0.19.6 - 2025-12-03

Changed

BREAKING: Explicit Database Creation: Databases must now be explicitly created before use
New haiku-rag init command creates a new empty database
Python API: HaikuRAG(path, create=True) to create database programmatically
Operations on non-existent databases raise FileNotFoundError
BREAKING: Embeddings Configuration: Restructured to nested EmbeddingModelConfig
Config path changed from embeddings.{provider, model, vector_dim} to embeddings.model.{provider, name, vector_dim}
Automatic migration upgrades existing databases to new format
Database Migrations: Always run when opening an existing database

0.19.5 - 2025-12-01

Changed

Rebuild Performance: Optimized rebuild --embed-only to use batch updates via LanceDB's merge_insert instead of individual chunk updates, and skip chunks with unchanged embeddings

0.19.4 - 2025-11-28

Added

Rebuild Modes: New options for rebuild command to control what gets rebuilt
--embed-only: Only regenerate embeddings, keeping existing chunks (fastest option when changing embedding model)
--rechunk: Re-chunk from existing document content without accessing source files
Default (no flag): Full rebuild with source file re-conversion
Python API: rebuild_database(mode=RebuildMode.EMBED_ONLY | RECHUNK | FULL)

0.19.3 - 2025-11-27

Changed

Async Chunker: DoclingServeChunker now uses httpx.AsyncClient instead of sync requests

Fixed

OCR Options: Fixed DoclingLocalConverter using base OcrOptions class which docling's OCR factory doesn't recognize. Now uses OcrAutoOptions for automatic OCR engine selection.
Dependencies: Added opencv-python-headless to the docling optional dependency for table structure detection.

0.19.2 - 2025-11-27

Changed

Async Converters: Made document converters fully async
BaseConverter.convert_file() and convert_text() are now async methods
DoclingLocalConverter wraps blocking Docling operations with asyncio.to_thread()
DoclingServeConverter now uses httpx.AsyncClient instead of sync requests
Async Model Prefetch: prefetch_models() is now async
Uses httpx.AsyncClient for Ollama model pulls
Wraps blocking Docling and HuggingFace downloads with asyncio.to_thread()

0.19.1 - 2025-11-26

Added

LM Studio Provider: Added support for LM Studio as a provider for embeddings and QA/research models
Configure with provider: lm_studio in embeddings, QA, or research model settings
Supports thinking control for reasoning models (gpt-oss, etc.)
Default base URL: http://localhost:1234

Fixed

Configuration: Fixed init-config command generating invalid configuration files (#165)
Refactored generate_default_config() to use Pydantic model serialization instead of manual dict construction
Updated qa, research, and reranking sections to use new ModelConfig structure

0.19.0 - 2025-11-25

Added

Model Customization: Added support for per-model configuration settings
New enable_thinking parameter to control reasoning behavior (true/false/None)
Support for temperature and max_tokens settings on QA and research models
All settings apply to any provider that supports them
Database Inspector: New inspect CLI command launches interactive TUI for browsing documents and chunks & searching
Evaluations: Added evaluations CLI script for running benchmarks (replaces python -m evaluations.benchmark)
Evaluations: Added --db option to override evaluation database path
Default database location moved to haiku.rag data directory:
- macOS: ~/Library/Application Support/haiku.rag/evaluations/dbs/
- Linux: ~/.local/share/haiku.rag/evaluations/dbs/
- Windows: C:/Users/<USER>/AppData/Roaming/haiku.rag/evaluations/dbs/
Previously stored in evaluations/data/ within the repository
Evaluations: Added comprehensive experiment metadata tracking for better reproducibility
Records dataset name, test case count, and all model configurations
Tracks embedder settings: provider, model, and vector dimensions
Tracks QA model: provider and model name
Tracks judge model: provider and model name for LLM evaluation
Tracks processing parameters: chunk_size and context_chunk_radius
Tracks retrieval configuration: retrieval_limit for number of chunks retrieved
Tracks reranking configuration: rerank_provider and rerank_model
Enables comparison of evaluation runs with different configurations in Logfire
Evaluations: Refactored retrieval evaluation to use pydantic-ai experiment framework
New evaluators module with MRREvaluator (Mean Reciprocal Rank) and MAPEvaluator (Mean Average Precision)
Retrieval benchmarks now use Dataset.evaluate() with full Logfire experiment tracking
Dataset specifications now declare their retrieval evaluator (MRR for RepliQA, MAP for Wix)
Replaced Recall@K and Success@K with industry-standard MRR and MAP metrics
Unified evaluation framework for both retrieval and QA benchmarks
AG-UI Events: Enhanced ActivitySnapshot events with richer structured data
Added stepName field to identify which graph node emitted each activity
Added structured fields to activity content while preserving backward-compatible message field:
- Planning: sub_questions - list of sub-question strings
- Searching: query - the search query, confidence - answer confidence (on success), error - error message (on failure)
- Analyzing (research): insights - list of insight objects, gaps - list of gap objects, resolved_gaps - list of resolved gap strings
- Evaluating (research): confidence - confidence score, is_sufficient - sufficiency flag
- Evaluating (deep QA): is_sufficient - sufficiency flag, iterations - iteration count

Changed

Evaluations: Renamed --qa-limit CLI parameter to --limit, now applies to both retrieval and QA benchmarks
Evaluations: Retrieval evaluator selection moved from runtime logic to dataset configuration

0.18.0 - 2025-11-21

Added

Manual Vector Indexing: New create-index CLI command for explicit vector index creation
Creates IVF_PQ indexes
Requires minimum 256 chunks (LanceDB training data requirement)
New search.vector_index_metric config option: cosine (default), l2, or dot
New search.vector_refine_factor config option (default: 30) for accuracy/speed tradeoff
Indexes not created automatically during ingestion to avoid performance degradation
Manual rebuilding required after adding significant new data
Enhanced Info Command: haiku-rag info now shows storage sizes and vector index statistics
Displays storage size for documents and chunks tables in human-readable format
Shows vector index status (exists/not created)
Shows indexed and unindexed chunk counts for monitoring index staleness

Changed

BREAKING: Default Embedding Model: Changed default embedding model from qwen3-embedding to qwen3-embedding:4b with vector dimension 2560 (previously 4096)
New installations will use the smaller, more efficient 4B parameter model by default
Action required: Existing databases created with the old default will be incompatible. Users must either:
- Explicitly set embeddings.model: "qwen3-embedding" and embeddings.vector_dim: 4096 in their config to maintain compatibility with existing databases
- Or run haiku-rag rebuild to re-embed all documents with the new default
This change provides better performance for most use cases while reducing resource requirements
Evaluations: Improved evaluation dataset naming and simplified evaluator configuration
EvalDataset now accepts dataset name for better organization in Logfire
Added --name CLI parameter to override evaluation run names
Removed IsInstance evaluator, using only LLMJudge for QA evaluation
Search Accuracy: Applied refine_factor to vector and hybrid searches for improved accuracy
Retrieves refine_factor * limit candidates and re-ranks in memory
Higher values increase accuracy but slow down queries

Fixed

AG-UI Activity Events: Activity events now correctly use structured dict content instead of strings
Graph Configuration: Graph builder functions now properly accept and use non-global config (#149)
build_research_graph() and build_deep_qa_graph() now pass config to all agents and model creation
get_model() utility function accepts config parameter (defaults to global Config)
Allows creating multiple graphs with different configurations in the same application

0.17.2 - 2025-11-19

Added

Document Update API: New update_document_fields() method for partial document updates
Update individual fields (content, metadata, title, chunks) without fetching full document
Support for custom chunks or auto-generation from content

Changed

Chunk Creation: ChunkRepository.create() now accepts both single chunks and lists for batch insertion
Batch insertion reduces LanceDB version creation when adding multiple chunks with custom chunks
Batch embedding generation for improved performance with multiple chunks
Updated core dependencies

0.17.1 - 2025-11-18

Added

Conversion Options: Fine-grained control over document conversion for both local and remote converters
New conversion_options config section in ProcessingConfig
OCR settings: do_ocr, force_ocr, ocr_lang for controlling OCR behavior
Table extraction: do_table_structure, table_mode (fast/accurate), table_cell_matching
Image settings: images_scale to control image resolution
Options work identically with both docling-local and docling-serve converters

Changed

Increase reranking candidate retrieval multiplier from 3x to 10x for improved result quality
Docker Images: Main haiku.rag image no longer automatically built and published
Conversion Options: Removed the legacy pdf_backend setting; docling now chooses the optimal backend automatically

0.17.0 - 2025-11-17

Added

Remote Processing: Support for docling-serve as remote document processing and chunking service
New converter config option: docling-local (default) or docling-serve
New chunker config option: docling-local (default) or docling-serve
New providers.docling_serve config section with base_url, api_key, and timeout
Comprehensive error handling for connection, timeout, and authentication issues
Chunking Strategies: Support for both hybrid and hierarchical chunking
New chunker_type config option: hybrid (default) or hierarchical
Hybrid chunking: Structure-aware splitting that respects document boundaries
Hierarchical chunking: Preserves document hierarchy for nested documents
Table Serialization Control: Configurable table representation in chunks
New chunking_use_markdown_tables config option (default: false)
false: Tables serialized as narrative text ("Value A, Column 2 = Value B")
true: Tables preserved as markdown format with structure
Chunking Configuration: Additional chunking control options
New chunking_merge_peers config option (default: true) to merge undersized successive chunks
Docker Images: Two Docker images for different deployment scenarios
haiku.rag: Full image with all dependencies for self-contained deployments
haiku.rag-slim: Minimal image designed for use with external docling-serve
Multi-platform support (linux/amd64, linux/arm64)
Docker Compose examples with docling-serve integration
Automated CI/CD workflows for both images
Build script (scripts/build-docker-images.sh) for local multi-platform builds

Changed

BREAKING: Chunking Tokenizer: Switched from tiktoken to HuggingFace tokenizers for consistency with docling-serve
Default tokenizer changed from tiktoken "gpt-4o" to "Qwen/Qwen3-Embedding-0.6B"
New chunking_tokenizer config option in ProcessingConfig for customization
download-models CLI command now also downloads the configured HuggingFace tokenizer
Docker Examples: Updated examples to demonstrate remote processing
examples/docker now uses slim image with docling-serve
examples/ag-ui-research backend uses slim image with docling-serve
Configuration examples include remote processing setup

0.16.1 - 2025-11-14

Changed

Evaluations: Refactored QA benchmark to run entire dataset as single evaluation for better Logfire experiment tracking
Evaluations: Added .env file loading support via python-dotenv dependency

0.16.0 - 2025-11-13

Added

AG-UI Protocol Support: Full AG-UI (Agent-UI) protocol implementation for graph execution with event streaming
New AGUIEmitter class for emitting AG-UI events from graphs
Support for all AG-UI event types: lifecycle events (RUN_STARTED, RUN_FINISHED, RUN_ERROR), step events (STEP_STARTED, STEP_FINISHED), state updates (STATE_SNAPSHOT, STATE_DELTA), activity narration (ACTIVITY_SNAPSHOT), and text messages (TEXT_MESSAGE_CHUNK)
AGUIConsoleRenderer for rendering AG-UI event streams to terminal with Rich formatting
stream_graph() utility function for executing graphs with AG-UI event emission
State diff computation for efficient state synchronization
Delta State Updates: AG-UI emitter now supports incremental state updates via JSON Patch operations (STATE_DELTA events) to reduce bandwidth, configurable via use_deltas parameter (enabled by default)
AG-UI Server: Starlette-based HTTP server for serving graphs via AG-UI protocol
Server-Sent Events (SSE) streaming endpoint at /v1/agent/stream
Health check endpoint at /health
Full CORS support configurable via agui config section
create_agui_server() function for programmatic server creation
Deep QA AG-UI Support: Deep QA graph now fully supports AG-UI event streaming
Integration with AGUIEmitter for progress tracking
Step-by-step execution visibility via AG-UI events
CLI AG-UI Flag: New --agui flag for serve command to start AG-UI server
Graph Module: New unified haiku.rag.graph module containing all graph-related functionality
Common Graph Nodes: New factory functions (create_plan_node, create_search_node) in haiku.rag.graph.common.nodes for reusable graph components
AG-UI Research Example: New full-stack example (examples/ag-ui-research) demonstrating agent+graph architecture with CopilotKit frontend
Pydantic AI agent with research tool that invokes the research graph
Custom AG-UI streaming endpoint with anyio memory streams
React/Next.js frontend with split-pane UI showing live research state
Real-time progress tracking of questions, answers, insights, and gaps
Docker Compose setup for easy local development

Changed

Vacuum Retention: Default vacuum_retention_seconds increased from 60 seconds to 86400 seconds (1 day) for better version retention in typical workflows
BREAKING: Major refactoring of graph-related code into unified haiku.rag.graph module structure:
haiku.rag.research → haiku.rag.graph.research
haiku.rag.qa.deep → haiku.rag.graph.deep_qa
haiku.rag.agui → haiku.rag.graph.agui
haiku.rag.graph_common → haiku.rag.graph.common
BREAKING: Research and Deep QA graphs now use AG-UI event protocol instead of direct console logging
Removed console and stream parameters from graph dependencies
All progress updates now emit through AGUIEmitter
BREAKING: ResearchState converted from dataclass to Pydantic BaseModel for JSON serialization and AG-UI compatibility
Research and Deep QA graphs now emit detailed execution events for better observability
CLI research command now uses AG-UI event rendering for --verbose output
Improved graph execution visibility with step-by-step progress tracking
Updated all documentation to reflect new import paths and AG-UI usage
Updated examples (ag-ui-research, a2a-server) to use new import paths

Fixed

Document Creation: Optimized create_document to skip unnecessary DoclingDocument conversion when chunks are pre-provided
FileReader: Error messages now include both original exception details and file path for easier debugging
Database Auto-creation: Read operations (search, list, get, ask, research) no longer auto-create empty databases. Write operations (add, add-src, delete, rebuild) still create the database as needed. This prevents the confusing scenario where a search query creates an empty database. Fixes issue #137.

Removed

BREAKING: Removed disable_autocreate config option - the behavior is now automatic based on operation type
BREAKING: Removed legacy ResearchStream and ResearchStreamEvent classes (replaced by AG-UI event protocol)

0.15.0 - 2025-11-07

Added

File Monitor: Orphan deletion feature - automatically removes documents from database when source files are deleted (enabled via monitor.delete_orphans config option, default: false)

Changed

Configuration: All CLI commands now properly support --config parameter for specifying custom configuration files
Configuration loading consolidated across CLI, app, and client with consistent resolution order
HaikuRAGApp and MCP server now accept config parameter for programmatic configuration
Updated CLI documentation to clarify global vs per-command options
BREAKING: Standardized configuration filename to haiku.rag.yaml in user directories (was incorrectly using config.yaml). Users with existing config.yaml in their user directory will need to rename it to haiku.rag.yaml

Fixed

File Monitor: Fixed incorrect "Updated document" logging for unchanged files - monitor now properly skips files when MD5 hash hasn't changed

Removed

BREAKING: A2A (Agent-to-Agent) protocol support has been moved to a separate self-contained package in examples/a2a-server/. The A2A server is no longer part of the main haiku.rag package. Users who need A2A functionality can install and run it from the examples directory with cd examples/a2a-server && uv sync.
BREAKING: Removed deprecated .env-based configuration system. The haiku-rag init-config --from-env command and load_config_from_env() function have been removed. All configuration must now be done via YAML files. Environment variables for API keys (e.g., OPENAI_API_KEY, ANTHROPIC_API_KEY) and service URLs (e.g., OLLAMA_BASE_URL) are still supported and can be set via .env files.

0.14.1 - 2025-11-06

Added

Migrated research and deep QA agents to use Pydantic Graph beta API for better graph execution
Automatic semaphore-based concurrency control for parallel sub-question processing
max_concurrency parameter for controlling parallel execution in research and deep QA (default: 1)

Changed

BREAKING: Research and Deep QA graphs now use pydantic_graph.beta instead of the class-based graph implementation
Refactored graph common patterns into graph_common module
Sub-questions now process using .map() for true parallel execution
Improved graph structure with cleaner node definitions and flow control
Pinned critical dependencies: docling-core, lancedb, docling

0.14.0 - 2024-11-05

Added

New haiku.rag-slim package with minimal dependencies for users who want to install only what they need
Evaluations package (haiku.rag-evals) for internal benchmarking and testing
Improved search filtering performance by using pandas DataFrames for joins instead of SQL WHERE IN clauses

Changed

BREAKING: Restructured project into UV workspace with three packages:
haiku.rag-slim - Core package with minimal dependencies
haiku.rag - Full package with all extras (recommended for most users)
haiku.rag-evals - Internal benchmarking and evaluation tools
Migrated from pydantic-ai to pydantic-ai-slim with extras system
Docling is now an optional dependency (install with haiku.rag-slim[docling])
Package metadata checks now use haiku.rag-slim (always present) instead of haiku.rag
Docker image optimized: removed evaluations package, reducing installed packages from 307 to 259
Improved vector search performance through optimized score normalization

Fixed

ImportError now properly raised when optional docling dependency is missing

0.13.3 - 2024-11-04

Added

Support for Zero Entropy reranker
Filter parameter to search() for filtering documents before search
Filter parameter to CLI search command
Filter parameter to CLI list command for filtering document listings
Config option to pass custom configuration files to evaluation commands
Document filtering now respects configured include/exclude patterns when using add-src with directories
Max retries to insight_agent when producing structured output

Fixed

CLI now loads .env files at startup
Info command no longer attempts to use deprecated .env settings
Documentation typos

0.13.2 - 2024-11-04

Added

Gitignore-style pattern filtering for file monitoring using pathspec
Include/exclude pattern documentation for FileMonitor

Changed

Moved monitor configuration to its own section in config
Improved configuration documentation
Updated dependencies

0.13.1 - 2024-11-03

Added

Initial version tracking