Changelog
Unreleased
Changed
- Bump
docling>=2.93.0anddocling-core>=2.75.0. - Bump
pydantic-ai-slim>=1.96.0. Migrate off deprecated APIs: AG-UI imports usepydantic_ai.ui.ag_ui, docs/CLI examples use the explicitopenai-chat:model prefix, andAgent(retries=)is split intotool_retries=+output_retries=. - Bump
pydantic-monty>=0.0.17. Migrate off deprecatedpydantic_monty.run_repl_async(repl, ...)torepl.feed_run_async(...). - Cap
transformers<5.0.0in themxbaiextra:mxbai-rerank>=0.1.6callstokenizer.prepare_for_modelwhich transformers 5 removed. - Refresh the rest of the lockfile to latest within current constraints (pydantic, pydantic-ai, rich, ruff, ty, pytest, torch, textual, textual-image, watchfiles, pre-commit, datasets, and transitives).
0.47.0 - 2026-05-14
Added
cross-encoderreranking provider. Runs any HuggingFace cross-encoder reranker in-process viasentence_transformers.CrossEncoder— no separate server. Useful for BGE (BAAI/bge-reranker-v2-m3), Qwen3-Reranker, MS-MARCO MiniLM, and other CrossEncoder-compatible models when vLLM is not an option. New[cross-encoder]extra pullssentence-transformers.
Fixed
rebuild --embed-onlyno longer buffers the entire corpus in memory. The previous implementation accumulated every chunk's id, content, content_fts, metadata, and new embedding vector in a single Python list before flushing. The rebuild now stream-copies non-vector columns into achunks_rebuild_stagingtable (1000 rows / page), recreates the chunks table fresh to honour vector-dim changes, then streams from staging one document at a time, embedding in batches ofembeddings.batch_sizeand flushing to the new chunks table every 50 documents.rebuild --embed-onlyis now idempotent across crashes. A second table,chunks_rebuild_marker, is written immediately after phase 1 (staging copy) finishes. Its presence flips the next rebuild into resume mode: phase 1 is skipped, the live chunks table is recreated, and phase 2 (re-embed) runs from the existing staging snapshot. Cleanup drops the marker before the staging table, so an interruption between the two drops leaves a markerless staging that the next run discards harmlessly. A staging table without a marker is treated as a partial phase 1 and dropped (the live chunks table is still authoritative). Running a non-embed-only mode (FULL / RECHUNK / DESCRIPTIONS / TITLE_ONLY) after a crashed embed-only correctly discards the staging recovery state. Phase 1's pagination was switched fromoffset/limittoto_batches, removing the latent offset-drift risk and the O(N²) cost at high offsets.
0.46.0 - 2026-05-13
Added
processing.conversion_options.fetch_remote_images(defaulttrue). Controls whether docling fetches images referenced by URL in HTML and Markdown inputs. docling-local only — docling-serve cannot fetch external images via its API regardless of this flag.s3://is a first-class document source.create_document_from_source, the CLIhaiku-rag add-src, and the MCPadd_document_from_urltool all dispatch on thes3URL scheme. Two-stage change detection keepsmetadata["md5"]semantically uniform across all sources: HEAD ETag matching the storedmetadata["etag"]short-circuits without GET; if ETag differs but bytes hash to the same MD5 (multipart re-upload, server-sideCopyObject, SSE mode change), only the etag refreshes — no re-chunk or re-embed. Closes #357.- S3 / object-storage monitoring.
monitor.s3: list[S3MonitorEntry]adds a polling watcher per bucket prefix alongside the existing local-directory watcher. Each entry has its ownpoll_interval,include_patterns,ignore_patterns,delete_orphans, andstorage_options. The sameserve --monitorflag enables both. Orphan deletion is per-entry (scoped viauri LIKE 's3://bucket/prefix/%'); other buckets and prefixes are never touched. [s3]optional extra (obstore>=0.9). Required fors3://sources and the S3 watcher. Uses obstore — the Python binding to the same Rustobject_storecrate that LanceDB uses internally — somonitor.s3[*].storage_optionsaccepts the same dict shape aslancedb.storage_options. Empty/missing options fall back to the AWS default credential chain.scripts/run-integration-tests.sh— wrapsdocker compose up --wait,pytest -m integration, and tear-down so the SeaweedFS-backed integration suite is a one-liner.ModelConfig.extra_body. Optional dict forwarded verbatim toModelSettings.extra_body, the raw pass-through pydantic-ai exposes for openai/ollama/anthropic/groq. Lets configs reach provider-specific keys without haiku.rag modelling them — e.g.extra_body: {chat_template_kwargs: {enable_thinking: false}}to disable Qwen3 thinking on a vLLM endpoint, where the high-levelenable_thinkingflag is a no-op.embeddings.batch_size(default512). Number of text chunks per/v1/embeddingscall during ingest. Lower it when your provider caps total tokens per request. Closes #365.
Changed
- Chat TUI streams markdown incrementally. Assistant messages now use Textual's
MarkdownStream(Markdown.get_stream) and write per-token deltas instead of re-parsing the entire accumulated message on every token. Removes the O(n²) re-parse that visibly stuttered long responses. Bumpstextualfloor to>=8.2.4soMarkdown.get_streamis reachable via the public API. - Embedding compatibility check only raises on
vector_dimmismatch.providerandnamedrift (legitimate when the same model is served by a different stack, e.g. Ollama → vLLM-via-openai) now logs a one-time warning and updates the stored settings to match the current config. Subsequent opens are silent. Runrebuild --embed-onlyif you also want to re-embed under the new stack. processing.picturesenum replacespicture_description.enabled. Three modes:none(skip picture generation entirely — lower RAM, smaller DBs),description(generate images, run VLM, store bytes),image(default — generate images, store bytes, no VLM). Closes #366. Breaking change: renamepicture_description.enabled: true→pictures: description,picture_description.enabled: false→pictures: image. The pre-April-30generate_picture_imagesflag is also gone; usepictures: nonefor that opt-out.
Fixed
- Picture bytes attached to multimodal tool returns are PNG-verified via
PIL.Image.verify(). Bytes that fail verification are dropped. - Conversion options now apply to non-PDF formats.
DoclingLocalConverterpreviously wired itsPdfPipelineOptionsonly toInputFormat.PDF, so user settings (OCR knobs,picture_description.enabled,images_scale, etc.) silently no-op'd for HTML, Markdown, DOCX, PPTX, and IMAGE inputs. The converter now shares a singlePdfPipelineOptionsinstance across PDF, IMAGE, HTML, MD, DOCX, and PPTXFormatOptions. SimplePipeline-backed formats ignore the PDF-specific fields;ConvertPipelineOptions-level enrichments (picture description / classification / chart extraction) now run uniformly. HTML and Markdown additionally receiveHTMLBackendOptions/MarkdownBackendOptionsgated onfetch_remote_images. - HTML text ingest path picks up converter options.
convert_text(format="html"/"md")previously used a bareDoclingDocConverter()with zero format options — the wix corpus ingest path. It now uses the same shared_build_format_options()helper as the file path. - Relative
<img>paths resolve during URL ingest.HaikuRAG.convert()and the converterconvert_file/convert_textmethods now thread asource_urithrough toHTMLBackendOptions.source_uri/MarkdownBackendOptions.source_uri. URL ingest uses the originating URL; file ingest usesfile://; raw text accepts an optional override. docling-serve accepts the kwarg as a no-op (its API has no equivalent option). - CLI tracebacks no longer dump per-frame locals. The Typer app now passes
pretty_exceptions_show_locals=False, so exceptions involving aDoclingDocument(or any large object) print readable rich tracebacks instead of pages of inline base64 image URIs. Set_TYPER_STANDARD_TRACEBACK=1for plain Python tracebacks. - Batch ingest no longer hits HF Hub's 429 rate limit. The chunking tokenizer is now loaded once per process via
@functools.cacheinstead of once per chunker instance.
Documentation
- New "External image fetching" subsection in
docs/configuration/processing.mddocumentingfetch_remote_images, the SSRF / size / timeout guards inherited from docling, and a per-format table of which conversion options actually apply (PDF, IMAGE, HTML, MD, DOCX/PPTX, others). - New "HTML Image Fetching" section in
docs/remote-processing.mdcalling out that docling-serve cannot fetch external<img>URLs and recommending docling-local for HTML ingest when picture bytes matter. - New "S3 / Object Storage Monitoring" section in
docs/server.mdanddocs/configuration/processing.mdcovering the[s3]extra, polling cadence, ETag semantics, credentials, and CLI usage. - New "Deployment Pattern: One Writer, Many Readers" subsection in
docs/configuration/storage.mddocumenting the recommended IAM split (one ingestion process + N read-only consumers).
0.45.0 - 2026-05-08
Added
- Vision capabilities. Picture-aware ingestion, vision QA, multimodal embeddings, and image-as-query search.
- Picture bytes always stored at ingest in a new
document_items.picture_datacolumn (large_binary), addressable by(document_id, self_ref). Bulk read paths project metadata-only so bytes never leak into context expansion or analysis-sandbox builds. The 0.45.0 migration adds the column on existing DBs and backfills it from each doc's docling blob; URIs are then stripped from the blob so bytes live in one place. - VLM picture descriptions at ingest via
processing.conversion_options.picture_description.enabled(defaultfalse). When enabled, descriptions are woven into chunk text. The earliergenerate_picture_imagesflag is dropped with a one-time warning.haiku-rag rebuild --descriptionsruns the VLM over stored bytes after the fact, idempotently — skipping the docling parse entirely. - Multimodal embedder (
provider="vllm") for cross-modal retrieval. Talks HTTP to a vLLM/v1/embeddingsendpoint (inputarray for text,messagessuperset withimage_urlfor images). Tested withQwen/Qwen3-VL-Embedding-8Bandjinaai/jina-embeddings-v4. No new Python ML dependencies. Under multimodal embedders, ingest emits one synthetic picture chunk perPictureItem, sharing the chunks table with text. - Image-as-query search.
client.search()acceptsstr | bytes | PIL.Image.Image. Image queries embed once and run vector-only against the chunks table. New CLI flaghaiku-rag search --image PATHand new MCP toolsearch_documents_by_image(image_base64, ...)(registered only when the embedder supports images). - Vision QA via
qa.model.vision: boolflag onModelConfig(defaultfalse). Whentrue, the agent'ssearchtool attaches picture bytes asBinaryContentparts on itsToolReturn. Default isfalsebecause providers behave inconsistently when an image is sent to a text-only model (Ollama silently accepts and confabulates; OpenAI returns 400).SearchResult.image_data: dict[str, str] | Nonecarries base64 picture bytes keyed byself_ref;client.search()and MCPsearch_documentsgaininclude_images: bool = True. - Silent-failure guard for picture descriptions. When
picture_description.enabled=trueand a conversion returns at least one picture but zero descriptions, log a warning naming the source, picture count, VLM model, and base URL. Surfaces docling-serve's swallowed VLM errors (unreachable host, missing model) before they pollute a long ingest. - Inspector renders attached pictures under
qa.model.vision=truein the context modal (ckey) so the inspector reflects what the LLM actually receives.
Fixed
rebuild --descriptionsno longer destroysdocling_pages. The previous implementation calledset_docling()after a structure-only docling load, which writesdocling_pages=Noneand clobbered page rasters for every doc with at least one undescribed picture (silently breakingvisualize_chunkfor the affected docs).- docling-serve picture-image extraction. docling-serve only emits picture bytes under
image_export_mode="referenced"(upstream docling-project/docling-serve#576). The converter switches toreferenced+target_type="zip"when picture images are requested and rehydratesartifacts/<filename>URIs back intodata:URIs. rebuild --rechunkreuses the stored docling blob instead of re-converting from the markdown export, which dropped everyPictureItemon the floor. Documents without a stored docling blob now raise instead of silently falling back.
Changed
- Lazy document hydration during rebuild. Each mode loop now fetches one full record at a time instead of eagerly loading all docs with their multi-MB blobs. Drops startup memory from ~15 GB to ~one document on a 1000-doc database.
0.44.0 - 2026-04-29
Added
- Skill-based QA evaluation via
evaluations run --target {qa,rag-skill,analysis-skill}. Benchmark the RAG and analysis skills end-to-end alongside the existing QA agent path, against the same datasets and judge.--skill-model "provider:name"overrides the skill model independently from the judge. - Citation retrieval as a second eval metric.
CitationMRREvaluatorandCitationMAPEvaluatorscore the URIs the skill registered via thecitetool against each dataset's goldexpected_uris, alongside the existing LLMJudge. Console output gains a "Citation Retrieval" summary (mean score, cite rate, mean citations per case). Zero extra skill runs — cited URIs are surfaced viapydantic_evals.set_eval_attribute. - Bumps
haiku.skillsto>=0.16.0for the publicrun_skillAPI andSkill.request_limit.
Changed
- Pinned eval judge defaults to
ollama:qwen3.6. Previously--judge-modeldefaulted toconfig.qa.model, so changing the QA or skill model also changed the judge — destabilizing cross-run comparisons and re-introducing self-judging whenever the answerer matched. A 2×2 calibration vs Claude Opus 4.7 (gpt-oss / qwen3.6 as both answerer and judge) showedqwen3.6had κ ≥ 0.66 on both same- and cross-family answerers (vs 0.39–0.55 forgpt-oss) with no detectable self-preference bias. Pass--judge-model provider:nameto override. - Tightened
citeframing in the RAG skill'sSKILL.md.citeis now a precondition for the final answer: the model identifies supporting chunk IDs and callscitebefore writing the response. The "MUST cite before answering" requirement carries an explicit refusal carve-out so the model does not cite irrelevant chunks when knowledge is missing. On the wix benchmark this lifted cite rate from 32% → 96%, meancited_mapfrom 0.15 → 0.48, and cut the "correct answer with no citation" pattern from 52% of cases to 1%, with QA accuracy holding at ~78%. - Removed dataset-specific eval system prompts.
WIX_SUPPORT_PROMPTandORB_SYSTEM_PROMPTduplicated guidance already in the shippedQA_SYSTEM_PROMPTandSKILL.md, and ORB's referenced the obsoletesearch_documentstool name. The eval-side machinery for injecting them (DatasetSpec.system_prompt,resolve_system_prompt()) is removed.config.prompts.qaremains as the user-facing override knob.
0.43.1 - 2026-04-25
Fixed
- Relative
db_pathno longer trips the LanceDB cloud-URI sanitizer. The 0.43 migration tolancedb.connect_asyncstarted routing the path through an async URI sanitizer that treats anything not clearly an absolute local path as a possible cloud URI, raisingValueError: An api_key is required when connecting to LanceDb Cloudon invocations likehaiku-rag info --db db/rag.lancedb. The path is now made absolute before being handed to LanceDB.
0.43.0 - 2026-04-24
Changed
- Native async LanceDB: all table I/O now uses LanceDB's async API (
connect_async,AsyncConnection,AsyncTable). Previously, repository methods were declaredasync defbut called blocking sync LanceDB under the hood, stalling the event loop on every read/write. No change to the documentedasync with HaikuRAG(...) as client:usage pattern. - BREAKING (internal):
HaikuRAGmust be used viaasync with. Store initialization now happens in__aenter__; constructingHaikuRAG(...)and calling methods directly without entering the context manager no longer works. - BREAKING (internal):
download_modelsis no longer a method onHaikuRAG. It's now a module-level function:from haiku.rag.client.downloads import download_models; async for progress in download_models(config): .... The CLI and in-repo consumers are updated. - Concurrency: background vacuum tracked as a task on the client.
__aexit__andrebuild_databasenow await it explicitly, preventingCreateIndex transaction was preemptedcommit conflicts when destructive operations follow acreate_documentthat scheduled a background vacuum.
Fixed
- Chat TUI now renders citations again. After the 0.42.1 flattening of skill state
citationstolist[str], the TUI still indexedcitations[-1]and iterated the resulting chunk-id string character-by-character, so no citations resolved throughcitation_indexand the citation panel stayed empty. Fixed by iteratingstate.citationsdirectly. search(..., filter=...)no longer silently under-returns. The filter path used to materialize LanceDB's top-N window, filter to matchingdocument_ids in pandas, andhead(limit). When matching chunks lived outside that top-N window (selective filters, broad queries), the caller got fewer thanlimitresults even though plenty of matching chunks existed in the index. The document filter is now pushed down into the chunk query asdocument_id IN (...)so.limit(limit)applies to matching chunks directly. Behavior change: searches that previously under-returned will start returning the requested count.
0.42.1 - 2026-04-22
Changed
- BREAKING: Skill state
citationsis nowlist[str]instead oflist[list[str]]. With per-invocation state scoping (0.42), the outer list no longer tracked turn boundaries — it only grouped chunk ids percitecall within a single invocation, which has no downstream meaning. The field is now a flat, deduplicated list of chunk ids cited during the current invocation. Clients resolve each id throughcitation_indexas before. Applies to bothRAGStateandAnalysisState.
0.42.0 - 2026-04-22
Fixed
create_document,update_document, and rebuild (RECHUNK/ full fallback) no longer misread URL-prefixed text as a URL to fetch. These paths passed known-text content throughHaikuRAG.convert(), which dispatches onurlparse(source).scheme; text whose first line washttps://...(common for clipped web pages and notes) got handed tohttpx.getand crashed withhttpx.InvalidURLon embedded whitespace. Fixed by callingconverter.convert_text(...)directly at those sites;convert()itself is unchanged forcreate_document_from_source.
Changed
- Skills share a single
HaikuRAGclient per invocation via the newhaiku.skills>=0.15.0lifespanhook. The skill's sub-agent opens one read-only client on entry, all tool calls reuse it, and it closes on exit — replacing the old pattern of open/close around everysearch/list_documents/get_documentcall. max_searchestracked onRAGRunDeps.search_countinstead of a module-levelctx.run_id-keyed dict. Eliminates a memory leak in long-running processes where old run ids were never evicted.- Analysis sandbox persists variables across
execute_codecalls within one invocation. Re-enables the incremental-exploration workflow (search in one call, process results in the next). Each new skill invocation constructs a freshSandboxvia the analysis lifespan, so there is no cross-invocation leak. - Skill state is scoped to the current invocation. Lifespans now clear
citations,searches, and (for analysis)executionsat the start of each invocation, so state deltas sent to the AG-UI client reflect only the in-progress turn.citation_indexis preserved across invocations so past-turn citation chunk ids remain resolvable, anddocument_filteris preserved as session-level config.
0.41.0 - 2026-04-20
Added
- Document virtual filesystem in analysis sandbox: Documents mounted at
/documents/{id}/withmetadata.json(eager),content.txt(lazy), anditems.jsonl(lazy). Standard Pythonpathlib.Pathfor browsing and reading document content and structure. execute_codeskill tool: Direct code execution in the sandbox, surfaced as individual AG-UI events in the chat TUI. Items VFS uses a lazy bulk cache (~1s for 1000 documents vs 60s+ per-document queries).citeskill tool: Explicit citation registration with per-turn tracking viacitation_indexandcitationsfields in state--skillflag for chat TUI:haiku-rag chat -s rag -s analysisto enable specific skills--modeloverrides all agents: Chat, QA, research, and analysis agents all use the specified model- Collapsible program display in chat TUI: Analysis code execution results shown as expandable code blocks
Changed
- BREAKING: Flatten skill architecture: Skill sub-agents now call
search,execute_code,cite,list_documents,get_documentdirectly — every tool call surfaces as an AG-UI event. Removes the 3rd agent layer whereask/analyze/researchspawned inner agents whose tool calls were invisible. - BREAKING: Rename RLM agent to analysis agent throughout:
agents/rlm/→agents/analysis/, all classes renamed (RLMResult→AnalysisResult, etc.)client.rlm()→client.analyze()- CLI:
haiku-rag rlm→haiku-rag analyze - MCP:
rlm_question→analyze - Config:
rlm:→analysis:in YAML,RLMConfig→AnalysisConfig - Skill entrypoint:
rag-rlm→rag-analysis - Analysis sandbox
search()returns expanded results withdoc_item_refsandlabelsfor cross-referencing withitems.jsonl list_documentsskill tool takes no parameters — returns all documents- Per-turn citation tracking:
citation_index: dict[str, Citation](deduplicated) +citations: list[list[str]](per-turn chunk IDs) replaces flat citation list - Search rate limiting: Skill search tool enforces
config.qa.max_searches - Context expansion respects section boundaries: Sections within the char budget are returned whole regardless of item count. Too-large sections expand bounded by section edges. Adjacent sections no longer merge — only overlapping ranges do.
- Visualization shows full expanded section:
visualize_chunkexpands context before resolving bounding boxes, so all pages the section spans get highlighted.
Removed
askskill tool: Replaced by directsearch+cite— the skill sub-agent searches and answers directlyanalyzeskill tool: Replaced by directexecute_code+search+citeresearchskill tool: Removed from skill layer (still available via CLIhaiku-rag researchand MCP)get_document(),get_docling_document(): Removed from analysis sandbox — replaced by VFSget_chunk(): Removed from analysis sandbox — search results include expanded contextcreate_analysis_toolset(): Removed unusedtools/analysis.pymoduleqa_history,reportsfrom skill state: Conversational context handled by the outer chat agentcombine_filters,build_document_filter: Removed from public APImax_context_items: Removed fromSearchConfig—max_context_charsis the sole expansion constraintQAHistoryEntry,tools/qa.py: Removed unused QA history model and relevance threshold
0.40.1 - 2026-04-17
Fixed
haiku-rag infoon pre-migration databases:infono longer fails with a misleadingCannot create tables in read-only modeerror when a required table added by a later version (e.g.document_itemsin 0.40.0) is absent. It now reports stats for the tables that do exist, marks the missing ones asabsent, and shows a dedicated section listing any pending migrations with thehaiku-rag migratehint (#346)
0.39.0 - 2026-04-16
Added
- Document items table: Pre-extracted document items stored as individual rows with scalar indexes, enabling context expansion via indexed range queries (~2.5ms) instead of full DoclingDocument deserialization (~8.7s for large documents)
- Section-bounded context expansion: Expansion is now automatic and structure-aware — stays within section boundaries for structured documents, grows outward for unstructured ones. Noise labels (footnotes, page headers/footers) are filtered. Results without
doc_item_refspass through unexpanded.
Changed
- Database migration required: Run
haiku-rag migrateto populatedocument_itemstable for existing documents - Pin docling-core: Upper bound added (
<2.72) to prevent uncontrolled schema changes max_searchesdefault: Raised from 3 to 5 — faster expansion makes additional searches inexpensive- Improved QA prompt: Stronger instruction to refuse answering from tangentially related content
- Improved judge prompt: Asymmetric evaluation — generated answers that are more comprehensive than expected are not penalized
Removed
context_radiusconfig: Replaced by automatic section-bounded expansion. Context expansion no longer requires configuration.- DoclingDocument LRU cache: No longer needed — the document_items table replaces in-memory caching for context expansion
cachetoolsdependency: No longer used
0.39.0 - 2026-04-09
Added
- S3/Object storage support: Connect to LanceDB on S3, GCS, Azure Blob, or HDFS via
lancedb.uriandstorage_optionsconfig. Supports S3-compatible stores with custom endpoints. - Remote skill generation:
create-skillnow supports remote databases — omit--dband provide--config-fileto generate skills that connect to object storage at runtime instead of bundling the database.
Fixed
- Skill
list_documentsignoresdocument_filter:list_documentstool now respectsstate.document_filter, consistent withsearch,ask, andresearch - Skill
analyzeignoresdocument_filter:analyzetool now usesstate.document_filter(combined with any explicitfilterparameter). Addeddocument_filterfield toRLMState
0.38.0 - 2026-04-07
Added
- Separate page storage: Page images stored in dedicated
docling_pagescolumn — search/expand never loads page data - zstd compression: Switch from gzip to zstd for docling document storage (Python 3.14 stdlib, zstandard package for older versions)
Document.set_docling(): Helper method that handles split compression and version assignment, replacing 11 manual call sitesDocument.get_page_images(): Load page images without the document structure, for visualize_chunkDocumentRepository.get_pages_data(): Load only page data column for a document
Changed
- Database migration required: Run
haiku-rag migrateto split existing docling blobs into structure + pages and re-compress with zstd
Fixed
- Generated skill
domain_preamble: Applyconfig.prompts.domain_preambleto instructions in generated skill packages
0.37.0 - 2026-04-07
Changed
- Dependency updates: lancedb 0.30.2, pydantic-ai-slim ≥1.77.0, docling ≥2.84.0, docling-core ≥2.71.0, haiku.skills ≥0.13.0, cachetools ≥7.0.5, pydantic-monty ≥0.0.9, cohere ≥5.21.1, textual ≥8.2.1, ty ≥0.0.28, ruff ≥0.15.9
- Search result model:
SearchResultnow includesorderfield propagated from chunk order
Fixed
- Type checking: Fix 37 new ty 0.0.28 diagnostics with proper None guards, assertions, and specific ignore codes
- Search performance: Avoid loading full document blobs (docling_document, content) during search — use column projection to fetch only needed metadata (id, uri, title, metadata)
- Context expansion performance: Load only docling columns during expand_context (skip content blob), and only when doc_item_refs exist
- Chunk expansion performance: Fetch only chunks in the needed order range during context expansion instead of all chunks for a document
- Embedding batching: Batch embedding calls in groups of 512 to avoid request size limits and timeouts with large documents
- DoclingDocument validation: Strip page images before validation on the read path — pages are only needed for visualize_chunk and account for ~99% of the JSON size
0.36.3 - 2026-04-01
Fixed
- Citation formatting: Replace raw UUIDs (
[doc_id:chunk_id]) with human-readable identifiers ([index] title) informat_citations()output, preventing LLMs from hallucinating opaque ID markers in answers - domain_preamble propagation:
domain_preamblenow flows to skill subagents and the main agent preamble, not just internal agents (QA, research). Fixes ambiguous queries failing when domain context was needed.
Changed
- domain_preamble docs: Clarified that
domain_preambleis for domain context (subject matter, terminology), not behavioral instructions (tone, response style).
0.36.2 - 2026-03-28
Fixed
- Skill extras: Include
db_pathandconfigin skill extras for both RAG and RLM skills, enabling post-creation reconfiguration
0.36.1 - 2026-03-27
0.36.0 - 2026-03-26
Added
- Chunk visualization for generated skills:
visualize_chunk(chunk_id)function exposed in generated skill packages, enabling callers to render visual grounding from chunk IDs in skill state - Configurable generated skills: Generated skill
create_skill()now accepts optionaldb_pathandconfigparameters, enabling post-discovery reconfiguration viaskill.reconfigure()(requires haiku.skills >= 0.11.0)
Fixed
- Generated skill packages: Include SKILL.md and assets in wheel distributions. Add README to generated packages.
- Docling-serve chunker: Detect per-document failure status that was silently returning 0 chunks when the task-level status was "success" but individual documents failed
- Docling local chunker: Re-enable
repeat_table_headerfor self-contained table chunks, improving retrieval quality and matching docling-serve behavior
0.35.1 - 2026-03-24
Added
create-skillCLI command: Generate standalone skill packages with embedded LanceDB databases. Generated packages register ashaiku.skillsentry points.
0.35.0 - 2026-03-24
Added
- Configurable judge and reflect models:
evaluations runandevaluations optimizeaccept--judge-model provider:name;optimizealso accepts--reflect-model provider:name. Both fall back toconfig.qa.modelwhen not specified. parse_model_option: Utility inhaiku.rag.utilsfor parsingprovider:namestrings intoModelConfig- New format extensions:
.tex,.latex,.qmd(Quarto),.rmd(R Markdown) supported in both local and serve converters
Changed
- LLMJudge: Custom evaluator now accepts
ModelConfiginstead of a model name string - Docling upgrade: docling-core ≥2.70.2 (schema 1.10.0), docling ≥2.81.0. Adds field data model support for structured form/KV content, wide table chunking fixes, and rich table cell hang fix
- pydantic-ai ≥1.70.0: Bumped minimum version. Removed
structured_output_typehelper — all supported providers now handle native structured output, so agents pass result types directly
0.34.1 - 2026-03-16
Added
- PlantUML support:
.puml,.plantuml, and.pufiles are now indexed asplantumlcode blocks
0.34.0 - 2026-03-13
Added
- Activity events: TUI and web frontend now display skill sub-agent tool calls via
ActivitySnapshotEvent
Changed
- RLM sandbox: Bumped pydantic-monty to 0.0.8. Removed
regex_*external functions — the sandbox now has nativereandmathmodules viaimport. Also addsfilter()andgetattr()builtins. - Frontend deps: Upgraded CopilotKit to 1.54.0 and @ag-ui/client to 0.0.47
0.33.3 - 2026-03-12
Added
- GEPA prompt optimization:
evaluations optimizecommand for automated QA system prompt improvement using evolutionary optimization with LLM-judged scoring. Cases are split 50/50 into train/val sets; GEPA budget is auto-computed from--num-candidatesand dataset size. - Tuning docs: Added step 7 (Optimize QA Prompts) to the tuning workflow in
docs/tuning.md - Evaluations test coverage: Tests for evaluators (MAP, MRR), config, benchmark helpers, dataset mappers/builders, and optimization
Fixed
- Read-only mode table creation:
--read-onlyno longer creates lance tables when pointed at an empty directory.Store._init_tables()now raisesReadOnlyErrorwhen tables are missing in read-only mode.
0.33.2 - 2026-03-11
Changed
- QA search cap: Replace dead
max_iterations/max_concurrencyconfig withmax_searches(default: 3). The QA agent now enforces a per-run search limit, reducing average response time from ~30s to ~15s while maintaining accuracy. The limit resets per agent run so toolsets can be safely reused. - Default search limit: Increased from 5 to 10 results per search query for better coverage.
Fixed
- QA citations: Strengthened prompt to clarify chunk ID format (complete IDs without brackets).
resolve_citationsnow strips[]from IDs, handling models that copy brackets from search result formatting.
0.33.1 - 2026-03-06
Changed
- Default model temperatures: Set task-appropriate temperature defaults — 0.3 for QA, research, and title generation; 0.0 for RLM and picture description. Previously unset (provider defaults, typically 0.7–1.0).
- QA thinking enabled by default:
enable_thinkingnow defaults toTruefor QA agent, improving answer quality with reasoning models. - Default title max_tokens: Set
max_tokens=100for title generation model to keep titles concise - Evaluation judge: Set
temperature=0.0andenable_thinking=Truefor deterministic, higher-quality judging. Removed unused judge config from retrieval benchmarks. - Test suite cleanup: Removed stale VCR cassettes, dead fixtures, orphaned directories, and redundant tests. Strengthened weak assertions across search, context enhancement, and converter tests. Relocated misplaced
SearchResult._get_primary_labeltest totest_search.py - Parallel test execution: Added
pytest-xdistand enabled parallel test runs by default (-n auto), reducing test suite time from ~3.5 min to ~2 min
0.33.0 - 2026-03-04
Added
- Module-level skill introspection API:
STATE_TYPE,STATE_NAMESPACE,skill_metadata(),instructions(), andstate_metadata()onhaiku.rag.skills.ragandhaiku.rag.skills.rlm— allows introspecting skill configuration without callingcreate_skill() - Automatic structured output detection: Native JSON schema output is used automatically when the model supports it, with tool-call fallback otherwise. No configuration needed.
Changed
haiku.skillsdependency: Bumped to>=0.7.0forStateMetadatadataclass
0.32.3 - 2026-03-03
Changed
- AG-UI skill streaming: Tool calls within skills are now streamed as real-time AG-UI events to the frontend. Requires
haiku.skills>=0.6.0
Fixed
- Search tool regression: Removed LLM-facing
filterparameter from search and list_documents tools. The SQL WHERE clause description confused LLMs, degrading QA accuracy. Document filtering is now handled programmatically viabase_filterandstate.document_filter
0.32.2 - 2026-02-28
Fixed
- Compatibility with haiku.skills 0.5.1: Replaced removed
SkillToolset.system_promptwithbuild_system_prompt(toolset.skill_catalog)across chat TUI, backend app, and examples - Minimum dependency: Bumped
haiku.skillsrequirement to>=0.5.1 - Chat model default: Chat TUI and backend app now use the configured QA model instead of hardcoded
openai:gpt-4o
0.32.1 - 2026-02-26
Added
- Automatic title generation: Documents can now have titles auto-generated during ingestion via
processing.auto_title: true. Uses two-tier extraction: structural metadata from DoclingDocument (HTML<title>, h1, section headers) first, with LLM fallback via configurableprocessing.title_model generate_title(): Public method onHaikuRAGto generate a title for an existing document on demandrebuild --title-only: New rebuild mode that generates titles only for untitled documents without re-chunking or re-embeddingadd --title: CLI option to set a title when adding text documents
0.32.0 - 2026-02-24
Changed
- RLM sandbox: Replaced Docker-based code execution with pydantic-monty, a minimal secure Python interpreter written in Rust. Eliminates Docker as a runtime dependency for RLM with sub-millisecond sandbox startup
- RLM sandbox functions: Added
get_chunk(chunk_id)for retrieving chunk content and metadata from search results.get_docling_document(document_id)now returns the full document structure as a JSON dict. All sandbox functions now requireawait RLMConfig: Removeddocker_imageanddocker_memory_limitfields
Added
- RLM sandbox regex functions:
regex_findall,regex_sub,regex_search,regex_splitfor pattern matching without LLM calls HaikuRAG.get_chunk_by_id(): Public method for chunk lookup by ID
Removed
docker_sandbox.py,runner.py: Docker container plumbing replaced bysandbox.py
0.31.1 - 2026-02-20
Fixed
infoandhistorycommands: Open database in read-only mode to prevent write failures on read-only filesystems
0.31.0 - 2026-02-20
Added
- RAG skill (
haiku.rag.skills.rag): haiku.skills integration with search, list_documents, get_document, ask, and research tools plus managedRAGState - RLM skill (
haiku.rag.skills.rlm): haiku.skills integration with analyze tool for computational analysis via code execution HaikuRAG.research(): Client method for multi-agent research- haiku.skills entry points:
rag = "haiku.rag.skills.rag:create_skill",rag-rlm = "haiku.rag.skills.rlm:create_skill"
Changed
- Chat TUI: Rebuilt on RAG skill + haiku.skills
SkillToolset - Web app backend: Rebuilt on RAG skill +
AGUIAdapter - Toolsets simplified: Removed
ToolContext,SessionState,AgentDeps,Toolkit; kept coreFunctionToolsetfactories - Research graph: Removed
session_contextand conversational output mode
Removed
agents/chat/: Entire chat agent module (replaced by RAG skill)--deepflag: Removed fromaskCLI (useresearchcommand instead)--context/--context-file: Removed fromaskCLItools/state machinery:ToolContext,ToolContextCache,SessionState,AgentDeps,Toolkit, etc.
0.30.2 - 2026-02-19
Fixed
- Added
cachetoolsas an explicit dependency (was only available transitively, causingModuleNotFoundErrorfor some installations) - download-models: Show actionable error message when Ollama is not running instead of cryptic "All connection attempts failed" (#277)
0.30.1 - 2026-02-17
Changed
- AG-UI state sync:
asktool now emitsStateDeltaEvent(JSON Patch) instead ofStateSnapshotEvent, consistent with thesearchtool
0.30.0 - 2026-02-16
Added
- Composable toolsets: New
haiku.rag.toolsmodule with reusableFunctionToolsetfactories that can be mixed into any pydantic-ai agent create_search_toolset()— hybrid search with context expansion and citation trackingcreate_document_toolset()— document listing, retrieval, and summarizationcreate_qa_toolset()— question answering via research graph with prior answer recallcreate_analysis_toolset()— computational analysis via RLM agent (Docker sandbox)Toolkitandbuild_toolkit(): High-level factory that bundles toolsets, prompt, and context creation for a given feature set. Reduces agent composition from ~15 lines to ~5.build_chat_toolkit()adds chat-specific defaults (background summarization callback)ToolContext: Namespace-based state container shared across toolsets. Toolsets register Pydantic models under string namespaces, enabling state accumulation (search results, citations, QA history) across invocationsToolContextCache: In-memory TTL-based cache forToolContextinstances, keyed by external session/thread ID. Replaces module-level caches for embeddings and summariesrun_qa_core(): Extracted core QA function for direct programmatic use without an agent- Feature-based chat agent:
create_chat_agent()accepts afeatureslist to select which toolsets are enabled (search,documents,qa,analysis). System prompt is composed to match - New documentation:
docs/tools.mdcovers all toolsets,ToolContext, state management, filter helpers, and composing custom agents
Changed
- Toolset factories decoupled from runtime dependencies:
create_search_toolset(),create_qa_toolset(),create_document_toolset(),create_analysis_toolset(), andcreate_chat_agent()no longer takeclientorcontextparameters. Instead, tool functions receive these via pydantic-ai'sRunContext.deps. This enables toolset and agent creation at configuration time (cacheable, created once), with only lightweight deps created per-request. Deps must satisfy theRAGDepsprotocol (client: HaikuRAG,tool_context: ToolContext | None) - Toolset factory return types narrowed to
FunctionToolset[RAGDeps]: All four toolset factories now declare their return type asFunctionToolset[RAGDeps]instead of bareFunctionToolset create_chat_agent()accepts optionaltoolkitparameter: Pass a pre-builtToolkitto share toolsets between agent and context creation, avoiding duplicate constructionChatDepsnow includesclient:ChatDeps(config=..., client=..., tool_context=...)— theclientfield was added since it's no longer captured by the agent factoryprepare_chat_context()helper: Extracted fromcreate_chat_agent()for idempotent namespace registration, since the agent factory no longer has access to the context- Chat agent architecture: Rebuilt on composable toolsets instead of monolithic tool definitions. Chat agent is now a thin wrapper around
create_search_toolset,create_document_toolset,create_qa_toolset, andcreate_analysis_toolset - State management simplified: Removed
session_id,incoming_session_id, andincoming_session_contextfrom the state layer.ToolContextCachepreserves all state (embeddings, summaries, QA history) on cachedToolContextinstances, eliminating the need for module-level caches - AG-UI state sync:
asktool now emitsStateSnapshotEventinstead ofStateDeltaEvent, ensuring background summarization results are reliably delivered to clients - TUI simplified: Chat TUI reads directly from
ToolContextnamespace states instead of maintaining a separateChatSessionStateand manually syncing via AG-UI state events - AG-UI web app: Uses
ToolContextCacheto maintain per-thread state across requests - Frontend session management: Persistent chat sessions with localStorage, wired to backend
ToolContextCachevia CopilotKitthreadId - Session manager dropdown: create, switch, delete, and export sessions to markdown
- Messages, chat state, and citations restored on session switch
- Session title derived from first user message
- Inline citation blocks injected after assistant responses via
qa_historycorrelation
Removed
SearchAgent: Replaced bycreate_search_toolset()- Module-level session caches:
_session_cache,cache_session_context,get_cached_session_context,cache_question_embedding,get_cached_embedding— all replaced by cachedToolContext ChatSessionStatefrom TUI: TUI no longer maintains its own copy of session state
0.29.1 - 2026-02-10
Fixed
- Document listing memory usage:
list_documentsno longer loads full document content and docling blobs by default, preventing out-of-memory errors on large databases. Useinclude_content=Truewhen content is needed. - Chat session_id not persisting across AG-UI requests:
ChatSessionState.session_idnow defaults to""instead of auto-generating a UUID. This ensures the session_id assignment is detected as a state change and included in theStateDeltaEventdelta, allowing clients to persist it across requests.
0.29.0 - 2026-02-06
Added
- docling-serve Chunker OCR Options: The docling-serve chunker now respects OCR settings from
conversion_options - Passes
do_ocr,force_ocr,ocr_engine, andocr_langto the chunking API - Allows disabling OCR via config when running docling-serve in read-only containers
- RLM Agent (Recursive Language Model): New agent for complex analytical tasks via sandboxed Python code execution
- Solves problems traditional RAG can't handle: aggregation, computation, multi-document analysis
- Docker-based sandbox with full Python environment (no import restrictions)
- Container reuse within a single
rlm()call for reduced latency - Available functions:
search(),list_documents(),get_document(),get_docling_document(),llm() - Pre-loaded documents support via
documentsvariable - Context filter for scoping searches without LLM control
- New
client.rlm(question)method on HaikuRAG client - New
haiku-rag rlmCLI command - New
rlm_questionMCP tool - New config options:
docker_image,docker_memory_limit - CI: Docker sandbox integration tests run in GitHub Actions
Fixed
- CI: Cache HuggingFace tokenizer to prevent flaky test failures when HuggingFace has transient outages
0.28.0 - 2026-01-31
Changed
- Iterative Research Planning: Research graph now uses an iterative feedback loop instead of batch question processing
- Planner proposes ONE question at a time, sees the answer, then decides whether to continue
- Removes
gather_contexttool — planner proposes questions directly - Simpler flow:
plan_next→search_one→ loop back until complete →synthesize - Consolidated
build_conversational_graph()intobuild_research_graph(output_mode="conversational")
Removed
- Dead config options: Removed vestigial fields from iterative planning refactor
confidence_thresholdfromResearchConfigandResearchState(LLM decides completion viais_complete)max_sub_questionsfromQAConfig(iterative flow uses one question at a time)sub_questionsfield fromResearchContext(no longer populated)
0.27.2 - 2026-01-29
Added
- Deep Ask Evaluations: QA benchmarks can now use the research graph for multi-step reasoning
- New
--deepflag onevaluations runenables deep ask mode - Uses research graph with
max_iterations=2andconfidence_threshold=0.0 - Evaluation name automatically suffixed with
_deepwhen enabled - Experiment metadata includes
deep_askfield for tracking - Chat Agent Document Awareness Tools: Two new tools for browsing and understanding the knowledge base
list_documents— ReturnsDocumentListResponsewith paginated documents (50 per page), page number, total pages, and total count; respects session document filtersummarize_document— Generate LLM-powered summaries of specific documents- Document Count API: New
count_documents(filter)method onHaikuRAGclient for efficient document counting - Read-Only Initial Context: Initial context is now locked after the first message, providing consistent session context
- Chat TUI:
--initial-contextCLI option sets background context for the session - Context can be edited via command palette before the first message is sent
- After first message, context becomes read-only (view only)
- Clearing chat resets context to CLI value and unlocks editing
- Web app: Memory panel now serves dual purpose - edit initial context before first message, view session context after
- Agent uses
initial_contextas fallback whensession_contextis empty
Changed
- AG-UI State Delta Updates: Web application now sends
StateDeltaEvent(JSON Patch RFC 6902) instead of fullStateSnapshotEventfor state updates - Reduces bandwidth when state grows large (e.g., 50 Q&As with citations)
- First request still sends full snapshot; subsequent requests send only changes
- Backend logging shows incoming/outgoing state events for debugging
Fixed
- Chat TUI Session State Sync: TUI now syncs full session state from AG-UI events
0.27.1 - 2026-01-27
Added
- Initial Context for Chat Sessions: New
initial_contextfield onChatSessionStateallows external clients to seed sessions with background context - Static context set once at session creation, used as fallback when no cached session context exists
- Incorporated into first summarization, after which evolved
session_contexttakes precedence - Eliminates need for clients to import and call internal cache functions (
cache_session_context,get_cached_session_context) session_idnow auto-generates a UUID if not provided (previously defaulted to empty string)
Fixed
- AG-UI StateSnapshotEvent JSON Serialization: Chat agent tools now use
model_dump(mode="json")when creatingStateSnapshotEvent - Fixes
TypeError: Object of type datetime is not JSON serializablewhen external clients persist AG-UI state to database JSON columns
0.27.0 - 2026-01-26
Added
- Evaluation Database Hosting: Pre-built evaluation databases available on HuggingFace
evaluations download <dataset>downloads pre-built databases fromggozad/haiku-rag-eval-dbsevaluations upload <dataset>uploads databases to HuggingFace (maintainer only)- Supports
allargument to download/upload all datasets at once - Use
--forceflag to overwrite existing databases - Avoids lengthy database rebuild times for users running benchmarks
- Stable Citation Registry: Citation indices now persist across tool calls within a session
- Same
chunk_idalways returns the same citation index (first-occurrence-wins) - New
citation_registry: dict[str, int]field onChatSessionState - New
get_or_assign_index(chunk_id)method for stable index assignment - Registry serialized/restored via AG-UI state protocol
- Prior Answer Recall: The
asktool automatically checks conversation history before research - Finds semantically similar prior answers using embedding similarity (0.7 cosine threshold)
- Relevant prior answers are passed to the research planner as context
- Planner can return empty sub_questions when context is sufficient, avoiding redundant searches
- Dynamic Session Context: Compressed conversation history for multi-turn chat
- New
SessionContextmodel stores summarized conversation state instead of raw Q&A history - Background LLM-based summarization runs after each
asktool call (non-blocking) - Previous summarization tasks are cancelled when new ones start
- Research graph receives compact context (~1,000-2,000 tokens) instead of raw qa_history (potentially thousands of tokens)
- New
session_contextfield onChatSessionStatesynced via AG-UI state protocol - Chat TUI: New context modal (
Ctrl+O) to view current session context - Session Document Filter: Restrict all search/ask operations to selected documents
- New
document_filterfield onChatSessionStatestores list of document titles/URIs - Session filter combines with per-tool
document_namefilter using AND logic - Multi-document selection uses OR logic within the session filter
- Filter persists across tool calls and chat clears via AG-UI state protocol
- Chat TUI: Access via command palette ("Filter documents" command)
- Web Application: Filter button in header shows count of selected documents
Changed
- Dependencies: Updated core dependencies
pydantic-ai-slim: 1.44.0 → 1.46.0lancedb: 0.26.1 → 0.27.0docling: 2.68.0 → 2.69.1docling-core: 2.59.0 → 2.60.1- VoyageAI Embeddings: Now uses pydantic-ai-slim's native VoyageAI support instead of custom implementation
- Removed
haiku.rag.embeddings.voyageaimodule - The
voyageaiextra now delegates topydantic-ai-slim[voyageai]
Removed
- Q&A History Functions: Removed standalone conversation history utilities
rank_qa_history_by_similarity()- similarity matching now integrated intoasktoolformat_conversation_context()- replaced bySessionContextsummarization- Associated embedding cache and helper functions also removed
0.26.9 - 2026-01-22
Fixed
- v0.25.0 Migration Failure: Fixed "Table 'documents' already exists" error during migration caused by held table references preventing
drop_table()from succeeding. Added recovery logic to restore documents from staging table if a previous migration attempt failed mid-way.
0.26.8 - 2026-01-22
Added
- Jina Reranker v3: Added support for Jina reranking with API mode (
provider: jina) and local inference (provider: jina-local, requires[jina]extra) - Model Downloads:
download-modelsnow pre-downloads HuggingFace models forsentence-transformers,mxbai, andjina-local - Reranker Factory: Removed unreliable
id(config)-based caching fromget_reranker(); factory now always instantiates fresh
Changed
- Agent Search Result Display: Search results now show rank position instead of raw scores
SearchResult.format_for_agent()accepts optionalrankandtotalparameters- Output changes from
(score: 0.02)to[rank 1 of 5]when rank is provided - Prevents LLMs from misinterpreting low RRF hybrid search scores as "2% relevant"
- QA and Research agents updated to pass rank/total to formatted results
- Agent prompts updated to reference rank-based ordering instead of scores
Fixed
- Test Cassette Organization: Consolidated all VCR cassettes to
tests/cassettes/ - Environment Loading: Fixed
.envfile loading to search from current working directory instead of source file directory (#250) - thanks @tianyicui
0.26.7 - 2026-01-20
Added
- OCR Engine Selection: New
ocr_engineoption inconversion_optionsto explicitly select OCR backend (#246) - Supported engines:
auto(default),easyocr,rapidocr,tesseract,tesserocr,ocrmac - Works with both
docling-localanddocling-serveconverters - Fixes inconsistent OCR engine selection between docling-serve startup and conversion requests
Removed
- A2A Example: Removed
examples/a2a-server/A2A protocol server example - Stale Example References: Cleaned up references to removed
ag-ui-researchexample from documentation
Changed
- MCP Error Handling: MCP tools now let exceptions propagate naturally; FastMCP converts them to proper MCP error responses
- Chunk Contextualization: Consolidated duplicate
contextualizelogic intoChunk.contextualize_content()method - Type Checker: Replaced pyright with ty, Astral's extremely fast Python type checker
- Added explicit
Agent[Deps, Output]type annotations to all pydantic-ai agents for better type inference - Removed ~24 unnecessary
# type: ignorecomments that ty correctly infers - Dependencies: Updated to latest versions
pydantic-ai-slim: 1.39.0 → 1.44.0docling: 2.67.0 → 2.68.0pathspec: 0.12.1 → 1.0.3textual: 7.0.0 → 7.3.0datasets: 4.4.2 → 4.5.0ruff: 0.14.11 → 0.14.13opencv-python-headless: 4.12.0.88 → 4.13.0.90
Fixed
- Chat TUI: Fixed crash when logfire is installed but user is not authenticated (#247)
0.26.6 - 2026-01-19
Changed
- Explicit Database Migrations: Database migrations are no longer applied automatically on open
- Opening a database with pending migrations now raises
MigrationRequiredErrorwith a clear message - New
haiku-rag migratecommand to explicitly apply pending migrations - Version-only updates (no schema changes) are applied silently in writable mode
- New
skip_migration_checkparameter onStorefor tools that need to bypass the check Store.migrate()method returns list of applied migration descriptions
0.26.5 - 2026-01-16
Added
- Background Context Support: Pass background context to agents via CLI or Python API
haiku-rag ask --context "..." --context-file pathfor Q&A with background contexthaiku-rag research --context "..." --context-file pathfor research with background contexthaiku-rag chat --context "..." --context-file pathfor chat sessions with persistent contextResearchContext(background_context="...")for Python API usageChatSessionState(background_context="...")for chat agent sessions- Context is included in agent system prompts and research graph planning
- Frontend Background Context: Settings panel in the chat app to configure persistent background context
- Context is stored in localStorage and sent with each conversation
- Frontend Linting: Added Biome for linting and formatting the frontend codebase
0.26.4 - 2026-01-15
Added
- AGUI_STATE_KEY Constant: Exported
AGUI_STATE_KEY("haiku.rag.chat") fromhaiku.rag.agents.chatfor namespaced AG-UI state emission - Enables integrators to use a consistent key when combining haiku.rag with other agents
- Backend, TUI, and frontend now use this key for state emission and extraction
0.26.3 - 2026-01-15
Added
- Enhanced Database Info:
haiku-rag infonow displayspydantic-aiversion anddocling-document schemaversion - Keyed State Emission for Chat Agent: New
state_keyparameter inChatDepsfor namespaced AG-UI state snapshots - When set, tools emit
{state_key: snapshot}instead of bare state, enabling state merging when multiple agents share state - Default
Nonepreserves backwards compatibility (bare state emission) - Page Image Generation Control: New
generate_page_imagesoption inConversionOptionsto control PDF page image extraction
Changed
- CLI Error Handling: Commands (
rebuild,vacuum,create-index,ask,research) now propagate errors with proper exit codes instead of swallowing exceptions
Fixed
- Embed-only rebuild with changed vector dimensions: Fixed
haiku-rag rebuild --embed-onlyfailing when the configured embedding model has different dimensions than the database - Store now reads stored vector dimension when opening existing databases, allowing chunks to be read regardless of current config
_rebuild_embed_onlyrecreates the chunks table to handle dimension changesgenerate_page_images: bool = True- Enable/disable rendered page images (used byvisualize_chunk())- Works with both
docling-localanddocling-serveconverters - For
docling-serve, maps toimage_export_modeAPI parameter (embedded/placeholder) - Note:
generate_picture_images(embedded figures/diagrams) works with local converter but has limited support in docling-serve
0.26.2 - 2026-01-13
Changed
- Dependencies: Updated docling dependencies for latest docling-serve compatibility (#229)
docling-core: 2.57.0 → 2.59.0 (supports schema 1.9.0)docling: 2.65.0 → 2.67.0
0.26.1 - 2026-01-13
Fixed
- Docling Schema Version Mismatch: Fixed incompatibility between
doclinganddocling-corecausingValidationError: Doc version 1.9.0 incompatible with SDK schema version 1.8.0when adding documents (#229) - Root cause:
docling-corewas reverted to 2.57.0 (schema 1.8.0) for docling-serve compatibility, butdoclingremained at 2.67.0 (schema 1.9.0) - Fix: Reverted
doclingfrom 2.67.0 to 2.65.0 to matchdocling-coreschema version
0.26.0 - 2026-01-13
Added
- Conversational RAG Application: Full-stack application (
app/) with CopilotKit frontend and pydantic-ai AG-UI backend - Next.js frontend with chat interface, citation display, and visual grounding
- Starlette backend using pydantic-ai's native
AGUIAdapterfor streaming - Docker Compose setup for development (
docker-compose.dev.yml) and production - Logfire integration for debugging LLM calls
- SSE heartbeat to prevent connection timeouts
- Chat Agent (
haiku.rag.agents.chat): New conversational RAG agent optimized for multi-turn chat create_chat_agent()factory function for creating chat agents with AG-UI supportSearchAgentfor internal query expansion with deduplicationChatDepsandChatSessionStatefor session managementCitationInfoandQAResponsemodels for structured responses- Natural language document filtering via
build_document_filter() - Configurable search limit per agent
- Chat TUI (
haiku-rag chat): Terminal-based chat interface using Textual - Single chat window with inline tool calls and expandable citations
- Visual grounding (
vkey) reuses inspector'sVisualGroundingModal - Database info (
ikey) shows document/chunk counts and storage info - Keybindings:
qquit,Ctrl+Lclear chat,Escapefocus input - Q/A History Management: Intelligent conversation history with semantic ranking
- FIFO queue with 50 max entries
- Embedding cache to avoid re-embedding Q/A pairs
rank_qa_history_by_similarity()returns top-K most relevant history entries- Confidence filtering to exclude low-confidence answers from context
- Conversational Research Graph: Simplified single-iteration research graph for chat
build_conversational_graph()optimized for conversational Q&A- Context-aware planning (generates fewer sub-questions when history exists)
ConversationalAnsweroutput type with direct answer and citations
Changed
- BREAKING: Module Reorganization: Consolidated all agent code under
haiku.rag.agents - Moved
haiku.rag.qa→haiku.rag.agents.qa - Moved
haiku.rag.graph.research→haiku.rag.agents.research - Added
haiku.rag.agents.chatmodule with conversational RAG agent - Deleted
haiku.rag.graphmodule (research graph now athaiku.rag.agents.research.graph)
Removed
- BREAKING: Custom AG-UI Infrastructure: Removed custom AG-UI event handling in favor of pydantic-ai's native AG-UI support
- Deleted
haiku.rag.graph.aguimodule (AGUIEmitter,AGUIConsoleRenderer,stream_graph(),create_agui_server()) - Removed
--aguiflag fromservecommand - Removed
--verboseflags fromaskandresearchcommands - Removed
--interactiveflag fromresearchcommand - Removed
AGUIConfigfrom configuration - Deleted
cli_chat.pyinteractive chat module - Research graph now uses
graph.run()directly instead ofstream_graph() - For AG-UI streaming, use pydantic-ai's native
AGUIAdapterwithToolReturnandStateSnapshotEvent(seeapp/backend/for example) - AG-UI Research Example: Removed
examples/ag-ui-research/(replaced byapp/)
0.25.0 - 2026-01-12
Fixed
- Large Document Storage Overflow: Fixed "byte array offset overflow" panic when vacuuming/rebuilding databases with many large PDF documents (#225)
- Root cause: Arrow's 32-bit string column offsets limited to ~2GB per fragment
- Changed
docling_document_json(string) todocling_document(bytes) withlarge_binaryArrow type (64-bit offsets) - Added gzip compression for DoclingDocument JSON (~1.4x compression ratio)
- Migration automatically compresses existing documents in batches to avoid memory issues
- Breaking: Migration is destructive - all table version history is lost after upgrade
Changed
- Dependencies: Updated lancedb 0.26.0 → 0.26.1, docling 2.65.0 → 2.67.0
Removed
- Legacy Migrations: Removed obsolete database migration files (
v0_9_3.py,v0_10_1.py,v0_19_6.py). These migrations were for versions prior to 0.20.0 and are no longer needed since the current release requires a database rebuild anyway.
0.24.2 - 2026-01-08
Fixed
- Base64 Images in Expanded Context: Fixed base64 image data leaking into expanded search results when
expand_context()processedPictureItemobjects. The issue wasPictureItem.export_to_markdown()defaulting toEMBEDDEDmode. Now explicitly usesPLACEHOLDERmode to prevent base64 data while still including VLM descriptions and captions.
0.24.1 - 2026-01-08
Fixed
- OpenAI Non-Reasoning Models: Fixed
reasoning_effortparameter being sent to non-reasoning OpenAI models (gpt-4o, gpt-4o-mini), causing 400 errors. Now correctly detects reasoning models (o1, o3 series) using pydantic-ai's model profile. - Bedrock Non-Reasoning Models: Fixed same issue for OpenAI models on Bedrock.
0.24.0 - 2026-01-07
Added
- VLM Picture Description: Describe embedded images using Vision Language Models during document conversion
- Images are sent to a VLM for automatic description via OpenAI-compatible API
- Descriptions become searchable text, improving RAG retrieval for visual content
- Configure via
processing.conversion_options.picture_descriptionwithenabled,model,timeout,max_tokens - Default prompt customizable via
prompts.picture_description - Requires OpenAI-compatible
/v1/chat/completionsendpoint (Ollama, OpenAI, vLLM, LM Studio)
0.23.2 - 2026-01-05
Fixed
- AG-UI Concurrent Step Tracking: Emitter now correctly tracks multiple concurrent steps (#216)
Changed
- Dependencies: Updated core and development dependencies
0.23.1 - 2025-12-29
Added
- Contextualized FTS Search: Full-text search now includes section headings
- New
content_ftscolumn stores contextualized content (headings + body text) - FTS index now searches
content_ftsfor better keyword matching on section context - Original
contentcolumn preserved for display and context expansion - Migration automatically populates
content_ftsfor existing databases - GitHub Actions CI: Test workflow runs pytest, pyright, and ruff on push/PR to main
- VCR Cassette Recording: Integration tests use recorded HTTP responses for deterministic CI runs
- LLM tests (QA, embeddings, research graph) replay from cassettes without real API calls
- docling-serve tests run without Docker container in CI
- Uses pytest-recording with custom JSON body serializer
0.23.0 - 2025-12-26
Added
- Prompt Customization: Configure agent prompts via
promptsconfig section domain_preamble: Prepended to all agent prompts for domain contextqa: Full replacement for QA agent promptsynthesis: Full replacement for research synthesis prompt
Changed
- Embeddings: Migrated to pydantic-ai's embeddings module
- Uses pydantic-ai v1.39.0+ embeddings with instrumentation and token counting support
- Explicit
embed_query()andembed_documents()API for query/document distinction - New providers available: Cohere (
cohere:), SentenceTransformers (sentence-transformers:) - VoyageAI refactored to extend pydantic-ai's
EmbeddingModelbase class - Configuration: Added
base_urltoModelConfigandEmbeddingModelConfig - Enables custom endpoints for OpenAI-compatible providers (vLLM, LM Studio, etc.)
- Model-level
base_urltakes precedence over provider config
Deprecated
- vLLM and LM Studio providers: Use
openaiprovider withbase_urlinstead provider: vllm→provider: openaiwithbase_url: http://localhost:8000/v1provider: lm_studio→provider: openaiwithbase_url: http://localhost:1234/v1
Removed
- Deleted obsolete embedder implementations:
ollama.py,openai.py,vllm.py,lm_studio.py,base.py - Removed
VLLMConfigandLMStudioConfigfrom configuration (usebase_urlin model config instead)
0.22.0 - 2025-12-19
Added
- Read-Only Mode: Global
--read-onlyCLI flag for safe database access without modifications - Blocks all write operations at the Store layer
- Skips database upgrades and settings saves on open
- Excludes write tools (
add_document_*,delete_document) from MCP server - Disables file monitor with warning when
--read-onlyis used withserve --monitor - Time Travel: Query the database as it existed at a previous point in time
- Global
--beforeCLI flag accepts datetime strings (ISO 8601 or date-only) - Automatically enables read-only mode when time-traveling
- New
historycommand shows version history for database tables - Useful for debugging and auditing
- Supported throughout: CLI, Client, App, Inspector
Fixed
- File Monitor Path Validation: Monitor now validates directories exist before watching (#204)
- Provides clear error message pointing to
haiku.rag.yamlconfiguration - Prevents cryptic
FileNotFoundError: No path was foundfrom watchfiles - Docker Documentation: Improved Docker setup instructions
- Added volume mount examples for config file and documents directory
- Clarified that
monitor.directoriesmust use container paths, not host paths
Changed
- Dependencies: Updated core dependencies
pydantic-ai-slim: 1.27.0 → 1.36.0 (FileSearchTool, web chat UI, GPT-5.2 support, prompt caching)lancedb: 0.25.3 → 0.26.0docling: 2.64.0 → 2.65.0docling-core: 2.54.0 → 2.57.0
0.21.0 - 2025-12-18
Added
- Interactive Research Mode: Human-in-the-loop research using graph-based decision nodes
haiku-rag research --interactivestarts conversational CLI chat- Natural language interpretation for user commands (search, modify questions, synthesize)
- Chat with assistant before starting research, and during decision points
- Review collected answers and pending questions at each decision point
- Add, remove, or modify sub-questions through natural conversation
- New
human_decidegraph node emits AG-UI tool calls (TOOL_CALL_START/ARGS/END) for frontend integration - New
emit_tool_call_start(),emit_tool_call_args(),emit_tool_call_end()AG-UI event helpers - New
AGUIEmitter.emit()method for direct event emission - AG-UI Research Example: Human-in-the-loop research with client-side tool calling
- Frontend handles
human_decisiontool calls via AG-UITOOL_CALL_*events - Tool results sent directly to backend
/v1/research/streamendpoint - Backend queues decisions and continues the research graph
- HotpotQA Evaluation: Added HotpotQA dataset adapter for multi-hop QA benchmarks
- Extracts unique documents from validation set context paragraphs
- Uses MAP for retrieval evaluation (multiple supporting documents per question)
- Run with
evaluations hotpotqa - Plain Text Format: Added
format="plain"for text conversion - Use when content is plain text without markdown/HTML structure
- Falls back gracefully when docling cannot detect markdown format in content
- Supported in
create_document(),convert(), and all converter classes
Changed
- AG-UI Events: Replaced custom event classes with
ag_ui.coretypes - Removed
haiku.rag.graph.agui.eventsmodule - Event factory functions (
emit_*) now wrap officialag_ui.coreevent classes - Chunker Sets Order: Chunkers now set
chunk.orderdirectly - Unified Research Graph: Simplified and unified research and deep QA into a single configurable graph
- Removed
analyze_insightsnode - graph now flows directly fromcollect_answerstodecide - Simplified
EvaluationResultto:is_sufficient,confidence_score,reasoning,new_questions - Simplified
ResearchContext- removed insight/gap tracking methods ask --deepnow uses research graph withmax_iterations=2,confidence_threshold=0.0ask --deepoutput now shows executive summary, key findings, and sources- Added
include_planparameter tobuild_research_graph()for plan-less execution - Added
max_iterationsandconfidence_thresholdoverrides toResearchState.from_config() - Improved Synthesis Prompt: Updated synthesis agent prompt to produce direct answers
- Executive summary now directly answers the question instead of describing the report
- Added explicit examples of good vs bad output style
- Evaluations Vacuum Strategy:
populate_dbnow uses periodic vacuum to prevent disk exhaustion with large datasets - Disables auto_vacuum during population, vacuums every N documents with retention=0
- New
--vacuum-intervalCLI option (default: 100) to control vacuum frequency - Prevents disk space issues when building databases with thousands of documents (e.g., HotpotQA)
- Benchmarks Documentation: Restructured benchmarks.md for clarity
- Added dedicated Methodology section explaining MRR, MAP, and QA Accuracy metrics
- Organized results by dataset with retrieval and QA subsections
Removed
- Deep QA Graph: Removed
haiku.rag.graph.deep_qamodule entirely - Use
build_research_graph()with appropriate parameters instead ask --deepCLI command now uses research graph internally- Insight/Gap Tracking: Removed over-engineered insight and gap tracking from research graph
- Removed
InsightRecord,GapRecord,InsightAnalysis,InsightStatus,GapSeveritymodels - Removed
format_analysis_for_prompt()helper - Removed
INSIGHT_AGENT_PROMPTfrom prompts
0.20.2 - 2025-12-12
Fixed
- LLM Schema Compliance: Improved prompts to prevent LLMs from returning objects instead of plain strings for
list[str]fields - All graph prompts now explicitly state that list fields must contain plain strings only
- Added missing
queryandconfidencefields to search agent output format documentation - Fixes validation errors with less capable models that ignore JSON schema constraints
- AG-UI Frontend Types: Fixed TypeScript interfaces in ag-ui-research example to match backend Python models
EvaluationResult:confidence→confidence_score,should_continue→is_sufficient,gaps_identified→gaps,follow_up_questions→new_questions, addedkey_insightsResearchReport:question→title,summary→executive_summary,findings→main_findings, removedinsights_used/methodology, addedlimitations/recommendations/sources_summary- Updated Final Report UI to display new fields (Recommendations, Limitations, Sources)
- Citation Formatting: Citations in CLI now render properly with Rich panels
- Content is rendered as markdown with proper code block formatting
- No longer truncates or flattens newlines in citation content
0.20.1 - 2025-12-11
Added
- Search Filter for Graphs: Research and Deep QA graphs now support
search_filterparameter to restrict searches to specific documents - Set
state.search_filterto a SQL WHERE clause (e.g.,"id IN ('doc1', 'doc2')") before running the graph - Enables document-scoped research workflows
- CLI:
haiku-rag research "question" --filter "uri LIKE '%paper%'" - CLI:
haiku-rag ask "question" --filter "title = 'My Doc'" - Python:
client.ask(question, filter="...")andagent.answer(question, filter="...") - AG-UI Research Example: Added bidirectional state demonstration with document filter
- New
/api/documentsendpoint to list available documents - Frontend document selector component with search and multi-select
- Demonstrates client-to-server state flow via AG-UI protocol
- Inspector Info Modal: New
ikeyboard shortcut opens a modal displaying database information
Changed
- Inspector Lazy Loading: Chunks panel now loads chunks in batches of 50 with infinite scroll
- Fixes unresponsive UI when viewing documents with large numbers of chunks
- New
ChunkRepository.get_by_document_id()pagination withlimitandoffsetparameters - New
ChunkRepository.count_by_document_id()method
0.20.0 - 2025-12-10
Added
- DoclingDocument Storage: Full DoclingDocument JSON is now stored with each document, enabling rich context and visual grounding
- Documents store the complete DoclingDocument structure (JSON) and schema version
- Chunks store metadata with JSON pointer references (
doc_item_refs), semantic labels, section headings, and page numbers - New
ChunkMetadatamodel for structured chunk provenance:doc_item_refs,headings,labels,page_numbers Document.get_docling_document()method to parse stored DoclingDocumentChunkMetadata.resolve_doc_items()to resolve JSON pointer refs to actual DocItem objectsChunkMetadata.resolve_bounding_boxes()for visual grounding with page coordinates- LRU cache (100 documents) for parsed DoclingDocument objects to avoid repeated JSON parsing
- Enhanced Search Results:
search()andexpand_context()now return full provenance information SearchResultincludespage_numbers,headings,labels, anddoc_item_refs- QA and research agents use provenance for better citations (page numbers, section headings)
- Type-Aware Context Expansion:
expand_context()now uses document structure for intelligent expansion - Structural content (tables, code blocks, lists) expands to complete structures regardless of chunking
- Text content uses radius-based expansion via
text_context_radiussetting max_context_itemsandmax_context_charssettings control expansion limitsSearchResult.format_for_agent()method formats expanded results with metadata for LLM consumption- Visual Grounding: View page images with highlighted bounding boxes for chunks
- Inspector modal with keyboard navigation between pages
- CLI command:
haiku-rag visualize <chunk_id> - Requires
textual-imagedependency and terminal with image support - Processing Primitives: New methods for custom document processing pipelines
convert()- Convert files, URLs, or text to DoclingDocumentchunk()- Chunk a DoclingDocument into Chunk objectscontextualize()- Prepend section headings to chunk content for embeddingembed_chunks()- Generate embeddings for chunks- New
import_document()Method: Import pre-processed documents with custom chunks - Accepts
DoclingDocumentdirectly for rich metadata (visual grounding, page numbers) - Use when document conversion, chunking, or embedding were done externally
- Chunks without embeddings are automatically embedded
- Automatic Chunk Embedding:
import_document()andupdate_document()automatically embed chunks that don't have embeddings - Pass chunks with or without embeddings - missing embeddings are generated
- Chunks with pre-computed embeddings are stored as-is
- Format Parameter for Text Conversion: New
formatparameter forconvert()andcreate_document()to specify content type - Supports
"md"(default) for markdown and"html"for HTML content - HTML format preserves document structure (headings, lists, sections) in DoclingDocument
- Enables proper parsing of HTML content that was previously treated as plain text
- Inspector Context Modal: Press
cin the inspector to view expanded context for the selected chunk - Auto-Vacuum Configuration: New
storage.auto_vacuumsetting to control automatic vacuuming behavior - When
true(default), vacuum runs automatically after document create/update operations and rebuilds - When
false, vacuum only runs via explicithaiku-rag vacuumcommand - Disabling can help avoid potential crashes in high-concurrency scenarios due to LanceDB race conditions
Changed
- BREAKING:
create_document()API: Removedchunksparameter create_document()now always processes content (converts, chunks, embeds)- Use
import_document()for pre-processed documents with custom chunks - BREAKING:
update_document()API: Unified withupdate_document_fields() - Old:
update_document(document)- pass modified Document object - New:
update_document(document_id, content=, metadata=, chunks=, title=, docling_document=) contentanddocling_documentare mutually exclusive- BREAKING: Chunker Interface:
DocumentChunker.chunk()now returnslist[Chunk]instead oflist[str] - Chunks include structured metadata (doc_item_refs, labels, headings, page_numbers)
- Search Config: New settings in
searchsection for search behavior and context expansion search.limit- Default number of search results (default: 5). Used by CLI, MCP server, and API when no limit specifiedsearch.context_radius- DocItems before/after to include for text content expansion (default: 0)search.max_context_items- Maximum items in expanded context (default: 10)search.max_context_chars- Maximum characters in expanded context (default: 10000)- Rebuild Performance: Batched database writes during
rebuildcommand reduce LanceDB versions by ~98% - All rebuild modes (FULL, RECHUNK, EMBED_ONLY) now batch writes across documents
- Eliminates redundant per-document chunk deletions and vacuum calls
- Significantly reduces storage overhead and improves rebuild speed for large databases
- Embedding Architecture: Moved embedding generation from
ChunkRepositoryto client layer - Repository is now a pure persistence layer
- Client handles embedding via
_ensure_chunks_embedded() - Chunk Text Storage: Chunks store raw text; headings prepended only at embedding time
- Stored chunk content stays clean without duplicate heading prefixes
- Local and serve chunkers now produce identical output
- Citation Models: Introduced
RawSearchAnswerfor LLM output,SearchAnswerwith resolved citations - Page Image Generation: Always enabled for local docling converter (required for visual grounding)
- Download Models Progress:
haiku-rag download-modelsnow shows real-time progress with Rich progress bars for Ollama model downloads
Removed
- BREAKING:
markdown_preprocessorConfig Option: Use processing primitives (convert(),chunk(),embed_chunks()) for custom pipelines update_document_fields(): Merged intoupdate_document()
Migration
This release requires a database rebuild to populate the new DoclingDocument fields:
Existing documents without DoclingDocument data will work but won't have provenance information.
0.19.6 - 2025-12-03
Changed
- BREAKING: Explicit Database Creation: Databases must now be explicitly created before use
- New
haiku-rag initcommand creates a new empty database - Python API:
HaikuRAG(path, create=True)to create database programmatically - Operations on non-existent databases raise
FileNotFoundError - BREAKING: Embeddings Configuration: Restructured to nested
EmbeddingModelConfig - Config path changed from
embeddings.{provider, model, vector_dim}toembeddings.model.{provider, name, vector_dim} - Automatic migration upgrades existing databases to new format
- Database Migrations: Always run when opening an existing database
0.19.5 - 2025-12-01
Changed
- Rebuild Performance: Optimized
rebuild --embed-onlyto use batch updates via LanceDB'smerge_insertinstead of individual chunk updates, and skip chunks with unchanged embeddings
0.19.4 - 2025-11-28
Added
- Rebuild Modes: New options for
rebuildcommand to control what gets rebuilt --embed-only: Only regenerate embeddings, keeping existing chunks (fastest option when changing embedding model)--rechunk: Re-chunk from existing document content without accessing source files- Default (no flag): Full rebuild with source file re-conversion
- Python API:
rebuild_database(mode=RebuildMode.EMBED_ONLY | RECHUNK | FULL)
0.19.3 - 2025-11-27
Changed
- Async Chunker:
DoclingServeChunkernow useshttpx.AsyncClientinstead of syncrequests
Fixed
- OCR Options: Fixed
DoclingLocalConverterusing baseOcrOptionsclass which docling's OCR factory doesn't recognize. Now usesOcrAutoOptionsfor automatic OCR engine selection. - Dependencies: Added
opencv-python-headlessto thedoclingoptional dependency for table structure detection.
0.19.2 - 2025-11-27
Changed
- Async Converters: Made document converters fully async
BaseConverter.convert_file()andconvert_text()are now async methodsDoclingLocalConverterwraps blocking Docling operations withasyncio.to_thread()DoclingServeConverternow useshttpx.AsyncClientinstead of syncrequests- Async Model Prefetch:
prefetch_models()is now async - Uses
httpx.AsyncClientfor Ollama model pulls - Wraps blocking Docling and HuggingFace downloads with
asyncio.to_thread()
0.19.1 - 2025-11-26
Added
- LM Studio Provider: Added support for LM Studio as a provider for embeddings and QA/research models
- Configure with
provider: lm_studioin embeddings, QA, or research model settings - Supports thinking control for reasoning models (gpt-oss, etc.)
- Default base URL:
http://localhost:1234
Fixed
- Configuration: Fixed
init-configcommand generating invalid configuration files (#165) - Refactored
generate_default_config()to use Pydantic model serialization instead of manual dict construction - Updated
qa,research, andrerankingsections to use newModelConfigstructure
0.19.0 - 2025-11-25
Added
- Model Customization: Added support for per-model configuration settings
- New
enable_thinkingparameter to control reasoning behavior (true/false/None) - Support for
temperatureandmax_tokenssettings on QA and research models - All settings apply to any provider that supports them
- Database Inspector: New
inspectCLI command launches interactive TUI for browsing documents and chunks & searching - Evaluations: Added
evaluationsCLI script for running benchmarks (replacespython -m evaluations.benchmark) - Evaluations: Added
--dboption to override evaluation database path - Default database location moved to haiku.rag data directory:
- macOS:
~/Library/Application Support/haiku.rag/evaluations/dbs/ - Linux:
~/.local/share/haiku.rag/evaluations/dbs/ - Windows:
C:/Users/<USER>/AppData/Roaming/haiku.rag/evaluations/dbs/
- macOS:
- Previously stored in
evaluations/data/within the repository - Evaluations: Added comprehensive experiment metadata tracking for better reproducibility
- Records dataset name, test case count, and all model configurations
- Tracks embedder settings: provider, model, and vector dimensions
- Tracks QA model: provider and model name
- Tracks judge model: provider and model name for LLM evaluation
- Tracks processing parameters:
chunk_sizeandcontext_chunk_radius - Tracks retrieval configuration:
retrieval_limitfor number of chunks retrieved - Tracks reranking configuration:
rerank_providerandrerank_model - Enables comparison of evaluation runs with different configurations in Logfire
- Evaluations: Refactored retrieval evaluation to use pydantic-ai experiment framework
- New
evaluatorsmodule withMRREvaluator(Mean Reciprocal Rank) andMAPEvaluator(Mean Average Precision) - Retrieval benchmarks now use
Dataset.evaluate()with full Logfire experiment tracking - Dataset specifications now declare their retrieval evaluator (MRR for RepliQA, MAP for Wix)
- Replaced Recall@K and Success@K with industry-standard MRR and MAP metrics
- Unified evaluation framework for both retrieval and QA benchmarks
- AG-UI Events: Enhanced ActivitySnapshot events with richer structured data
- Added
stepNamefield to identify which graph node emitted each activity - Added structured fields to activity content while preserving backward-compatible
messagefield:- Planning:
sub_questions- list of sub-question strings - Searching:
query- the search query,confidence- answer confidence (on success),error- error message (on failure) - Analyzing (research):
insights- list of insight objects,gaps- list of gap objects,resolved_gaps- list of resolved gap strings - Evaluating (research):
confidence- confidence score,is_sufficient- sufficiency flag - Evaluating (deep QA):
is_sufficient- sufficiency flag,iterations- iteration count
- Planning:
Changed
- Evaluations: Renamed
--qa-limitCLI parameter to--limit, now applies to both retrieval and QA benchmarks - Evaluations: Retrieval evaluator selection moved from runtime logic to dataset configuration
0.18.0 - 2025-11-21
Added
- Manual Vector Indexing: New
create-indexCLI command for explicit vector index creation - Creates IVF_PQ indexes
- Requires minimum 256 chunks (LanceDB training data requirement)
- New
search.vector_index_metricconfig option:cosine(default),l2, ordot - New
search.vector_refine_factorconfig option (default: 30) for accuracy/speed tradeoff - Indexes not created automatically during ingestion to avoid performance degradation
- Manual rebuilding required after adding significant new data
- Enhanced Info Command:
haiku-rag infonow shows storage sizes and vector index statistics - Displays storage size for documents and chunks tables in human-readable format
- Shows vector index status (exists/not created)
- Shows indexed and unindexed chunk counts for monitoring index staleness
Changed
- BREAKING: Default Embedding Model: Changed default embedding model from
qwen3-embeddingtoqwen3-embedding:4bwith vector dimension 2560 (previously 4096) - New installations will use the smaller, more efficient 4B parameter model by default
- Action required: Existing databases created with the old default will be incompatible. Users must either:
- Explicitly set
embeddings.model: "qwen3-embedding"andembeddings.vector_dim: 4096in their config to maintain compatibility with existing databases - Or run
haiku-rag rebuildto re-embed all documents with the new default
- Explicitly set
- This change provides better performance for most use cases while reducing resource requirements
- Evaluations: Improved evaluation dataset naming and simplified evaluator configuration
EvalDatasetnow accepts dataset name for better organization in Logfire- Added
--nameCLI parameter to override evaluation run names - Removed
IsInstanceevaluator, using onlyLLMJudgefor QA evaluation - Search Accuracy: Applied
refine_factorto vector and hybrid searches for improved accuracy - Retrieves
refine_factor * limitcandidates and re-ranks in memory - Higher values increase accuracy but slow down queries
Fixed
- AG-UI Activity Events: Activity events now correctly use structured dict content instead of strings
- Graph Configuration: Graph builder functions now properly accept and use non-global config (#149)
build_research_graph()andbuild_deep_qa_graph()now pass config to all agents and model creationget_model()utility function acceptsconfigparameter (defaults to global Config)- Allows creating multiple graphs with different configurations in the same application
0.17.2 - 2025-11-19
Added
- Document Update API: New
update_document_fields()method for partial document updates - Update individual fields (content, metadata, title, chunks) without fetching full document
- Support for custom chunks or auto-generation from content
Changed
- Chunk Creation:
ChunkRepository.create()now accepts both single chunks and lists for batch insertion - Batch insertion reduces LanceDB version creation when adding multiple chunks with custom chunks
- Batch embedding generation for improved performance with multiple chunks
- Updated core dependencies
0.17.1 - 2025-11-18
Added
- Conversion Options: Fine-grained control over document conversion for both local and remote converters
- New
conversion_optionsconfig section inProcessingConfig - OCR settings:
do_ocr,force_ocr,ocr_langfor controlling OCR behavior - Table extraction:
do_table_structure,table_mode(fast/accurate),table_cell_matching - Image settings:
images_scaleto control image resolution - Options work identically with both
docling-localanddocling-serveconverters
Changed
- Increase reranking candidate retrieval multiplier from 3x to 10x for improved result quality
- Docker Images: Main
haiku.ragimage no longer automatically built and published - Conversion Options: Removed the legacy
pdf_backendsetting; docling now chooses the optimal backend automatically
0.17.0 - 2025-11-17
Added
- Remote Processing: Support for docling-serve as remote document processing and chunking service
- New
converterconfig option:docling-local(default) ordocling-serve - New
chunkerconfig option:docling-local(default) ordocling-serve - New
providers.docling_serveconfig section withbase_url,api_key, andtimeout - Comprehensive error handling for connection, timeout, and authentication issues
- Chunking Strategies: Support for both hybrid and hierarchical chunking
- New
chunker_typeconfig option:hybrid(default) orhierarchical - Hybrid chunking: Structure-aware splitting that respects document boundaries
- Hierarchical chunking: Preserves document hierarchy for nested documents
- Table Serialization Control: Configurable table representation in chunks
- New
chunking_use_markdown_tablesconfig option (default:false) false: Tables serialized as narrative text ("Value A, Column 2 = Value B")true: Tables preserved as markdown format with structure- Chunking Configuration: Additional chunking control options
- New
chunking_merge_peersconfig option (default:true) to merge undersized successive chunks - Docker Images: Two Docker images for different deployment scenarios
haiku.rag: Full image with all dependencies for self-contained deploymentshaiku.rag-slim: Minimal image designed for use with external docling-serve- Multi-platform support (linux/amd64, linux/arm64)
- Docker Compose examples with docling-serve integration
- Automated CI/CD workflows for both images
- Build script (
scripts/build-docker-images.sh) for local multi-platform builds
Changed
- BREAKING: Chunking Tokenizer: Switched from tiktoken to HuggingFace tokenizers for consistency with docling-serve
- Default tokenizer changed from tiktoken "gpt-4o" to "Qwen/Qwen3-Embedding-0.6B"
- New
chunking_tokenizerconfig option inProcessingConfigfor customization download-modelsCLI command now also downloads the configured HuggingFace tokenizer- Docker Examples: Updated examples to demonstrate remote processing
examples/dockernow uses slim image with docling-serveexamples/ag-ui-researchbackend uses slim image with docling-serve- Configuration examples include remote processing setup
0.16.1 - 2025-11-14
Changed
- Evaluations: Refactored QA benchmark to run entire dataset as single evaluation for better Logfire experiment tracking
- Evaluations: Added
.envfile loading support viapython-dotenvdependency
0.16.0 - 2025-11-13
Added
- AG-UI Protocol Support: Full AG-UI (Agent-UI) protocol implementation for graph execution with event streaming
- New
AGUIEmitterclass for emitting AG-UI events from graphs - Support for all AG-UI event types: lifecycle events (
RUN_STARTED,RUN_FINISHED,RUN_ERROR), step events (STEP_STARTED,STEP_FINISHED), state updates (STATE_SNAPSHOT,STATE_DELTA), activity narration (ACTIVITY_SNAPSHOT), and text messages (TEXT_MESSAGE_CHUNK) AGUIConsoleRendererfor rendering AG-UI event streams to terminal with Rich formattingstream_graph()utility function for executing graphs with AG-UI event emission- State diff computation for efficient state synchronization
- Delta State Updates: AG-UI emitter now supports incremental state updates via JSON Patch operations (
STATE_DELTAevents) to reduce bandwidth, configurable viause_deltasparameter (enabled by default) - AG-UI Server: Starlette-based HTTP server for serving graphs via AG-UI protocol
- Server-Sent Events (SSE) streaming endpoint at
/v1/agent/stream - Health check endpoint at
/health - Full CORS support configurable via
aguiconfig section create_agui_server()function for programmatic server creation- Deep QA AG-UI Support: Deep QA graph now fully supports AG-UI event streaming
- Integration with
AGUIEmitterfor progress tracking - Step-by-step execution visibility via AG-UI events
- CLI AG-UI Flag: New
--aguiflag forservecommand to start AG-UI server - Graph Module: New unified
haiku.rag.graphmodule containing all graph-related functionality - Common Graph Nodes: New factory functions (
create_plan_node,create_search_node) inhaiku.rag.graph.common.nodesfor reusable graph components - AG-UI Research Example: New full-stack example (
examples/ag-ui-research) demonstrating agent+graph architecture with CopilotKit frontend - Pydantic AI agent with research tool that invokes the research graph
- Custom AG-UI streaming endpoint with anyio memory streams
- React/Next.js frontend with split-pane UI showing live research state
- Real-time progress tracking of questions, answers, insights, and gaps
- Docker Compose setup for easy local development
Changed
- Vacuum Retention: Default
vacuum_retention_secondsincreased from 60 seconds to 86400 seconds (1 day) for better version retention in typical workflows - BREAKING: Major refactoring of graph-related code into unified
haiku.rag.graphmodule structure: haiku.rag.research→haiku.rag.graph.researchhaiku.rag.qa.deep→haiku.rag.graph.deep_qahaiku.rag.agui→haiku.rag.graph.aguihaiku.rag.graph_common→haiku.rag.graph.common- BREAKING: Research and Deep QA graphs now use AG-UI event protocol instead of direct console logging
- Removed
consoleandstreamparameters from graph dependencies - All progress updates now emit through
AGUIEmitter - BREAKING:
ResearchStateconverted from dataclass to PydanticBaseModelfor JSON serialization and AG-UI compatibility - Research and Deep QA graphs now emit detailed execution events for better observability
- CLI research command now uses AG-UI event rendering for
--verboseoutput - Improved graph execution visibility with step-by-step progress tracking
- Updated all documentation to reflect new import paths and AG-UI usage
- Updated examples (ag-ui-research, a2a-server) to use new import paths
Fixed
- Document Creation: Optimized
create_documentto skip unnecessary DoclingDocument conversion when chunks are pre-provided - FileReader: Error messages now include both original exception details and file path for easier debugging
- Database Auto-creation: Read operations (search, list, get, ask, research) no longer auto-create empty databases. Write operations (add, add-src, delete, rebuild) still create the database as needed. This prevents the confusing scenario where a search query creates an empty database. Fixes issue #137.
Removed
- BREAKING: Removed
disable_autocreateconfig option - the behavior is now automatic based on operation type - BREAKING: Removed legacy
ResearchStreamandResearchStreamEventclasses (replaced by AG-UI event protocol)
0.15.0 - 2025-11-07
Added
- File Monitor: Orphan deletion feature - automatically removes documents from database when source files are deleted (enabled via
monitor.delete_orphansconfig option, default: false)
Changed
- Configuration: All CLI commands now properly support
--configparameter for specifying custom configuration files - Configuration loading consolidated across CLI, app, and client with consistent resolution order
HaikuRAGAppand MCP server now acceptconfigparameter for programmatic configuration- Updated CLI documentation to clarify global vs per-command options
- BREAKING: Standardized configuration filename to
haiku.rag.yamlin user directories (was incorrectly usingconfig.yaml). Users with existingconfig.yamlin their user directory will need to rename it tohaiku.rag.yaml
Fixed
- File Monitor: Fixed incorrect "Updated document" logging for unchanged files - monitor now properly skips files when MD5 hash hasn't changed
Removed
- BREAKING: A2A (Agent-to-Agent) protocol support has been moved to a separate self-contained package in
examples/a2a-server/. The A2A server is no longer part of the main haiku.rag package. Users who need A2A functionality can install and run it from the examples directory withcd examples/a2a-server && uv sync. - BREAKING: Removed deprecated
.env-based configuration system. Thehaiku-rag init-config --from-envcommand andload_config_from_env()function have been removed. All configuration must now be done via YAML files. Environment variables for API keys (e.g.,OPENAI_API_KEY,ANTHROPIC_API_KEY) and service URLs (e.g.,OLLAMA_BASE_URL) are still supported and can be set via.envfiles.
0.14.1 - 2025-11-06
Added
- Migrated research and deep QA agents to use Pydantic Graph beta API for better graph execution
- Automatic semaphore-based concurrency control for parallel sub-question processing
max_concurrencyparameter for controlling parallel execution in research and deep QA (default: 1)
Changed
- BREAKING: Research and Deep QA graphs now use
pydantic_graph.betainstead of the class-based graph implementation - Refactored graph common patterns into
graph_commonmodule - Sub-questions now process using
.map()for true parallel execution - Improved graph structure with cleaner node definitions and flow control
- Pinned critical dependencies:
docling-core,lancedb,docling
0.14.0 - 2024-11-05
Added
- New
haiku.rag-slimpackage with minimal dependencies for users who want to install only what they need - Evaluations package (
haiku.rag-evals) for internal benchmarking and testing - Improved search filtering performance by using pandas DataFrames for joins instead of SQL WHERE IN clauses
Changed
- BREAKING: Restructured project into UV workspace with three packages:
haiku.rag-slim- Core package with minimal dependencieshaiku.rag- Full package with all extras (recommended for most users)haiku.rag-evals- Internal benchmarking and evaluation tools- Migrated from
pydantic-aitopydantic-ai-slimwith extras system - Docling is now an optional dependency (install with
haiku.rag-slim[docling]) - Package metadata checks now use
haiku.rag-slim(always present) instead ofhaiku.rag - Docker image optimized: removed evaluations package, reducing installed packages from 307 to 259
- Improved vector search performance through optimized score normalization
Fixed
- ImportError now properly raised when optional docling dependency is missing
0.13.3 - 2024-11-04
Added
- Support for Zero Entropy reranker
- Filter parameter to
search()for filtering documents before search - Filter parameter to CLI
searchcommand - Filter parameter to CLI
listcommand for filtering document listings - Config option to pass custom configuration files to evaluation commands
- Document filtering now respects configured include/exclude patterns when using
add-srcwith directories - Max retries to insight_agent when producing structured output
Fixed
- CLI now loads
.envfiles at startup - Info command no longer attempts to use deprecated
.envsettings - Documentation typos
0.13.2 - 2024-11-04
Added
- Gitignore-style pattern filtering for file monitoring using pathspec
- Include/exclude pattern documentation for FileMonitor
Changed
- Moved monitor configuration to its own section in config
- Improved configuration documentation
- Updated dependencies
0.13.1 - 2024-11-03
Added
- Initial version tracking