Skip to content

Changelog

Unreleased

0.33.1 - 2026-03-06

Changed

  • Default model temperatures: Set task-appropriate temperature defaults — 0.3 for QA, research, and title generation; 0.0 for RLM and picture description. Previously unset (provider defaults, typically 0.7–1.0).
  • QA thinking enabled by default: enable_thinking now defaults to True for QA agent, improving answer quality with reasoning models.
  • Default title max_tokens: Set max_tokens=100 for title generation model to keep titles concise
  • Evaluation judge: Set temperature=0.0 and enable_thinking=True for deterministic, higher-quality judging. Removed unused judge config from retrieval benchmarks.
  • Test suite cleanup: Removed stale VCR cassettes, dead fixtures, orphaned directories, and redundant tests. Strengthened weak assertions across search, context enhancement, and converter tests. Relocated misplaced SearchResult._get_primary_label test to test_search.py
  • Parallel test execution: Added pytest-xdist and enabled parallel test runs by default (-n auto), reducing test suite time from ~3.5 min to ~2 min

0.33.0 - 2026-03-04

Added

  • Module-level skill introspection API: STATE_TYPE, STATE_NAMESPACE, skill_metadata(), instructions(), and state_metadata() on haiku.rag.skills.rag and haiku.rag.skills.rlm — allows introspecting skill configuration without calling create_skill()
  • Automatic structured output detection: Native JSON schema output is used automatically when the model supports it, with tool-call fallback otherwise. No configuration needed.

Changed

  • haiku.skills dependency: Bumped to >=0.7.0 for StateMetadata dataclass

0.32.3 - 2026-03-03

Changed

  • AG-UI skill streaming: Tool calls within skills are now streamed as real-time AG-UI events to the frontend. Requires haiku.skills>=0.6.0

Fixed

  • Search tool regression: Removed LLM-facing filter parameter from search and list_documents tools. The SQL WHERE clause description confused LLMs, degrading QA accuracy. Document filtering is now handled programmatically via base_filter and state.document_filter

0.32.2 - 2026-02-28

Fixed

  • Compatibility with haiku.skills 0.5.1: Replaced removed SkillToolset.system_prompt with build_system_prompt(toolset.skill_catalog) across chat TUI, backend app, and examples
  • Minimum dependency: Bumped haiku.skills requirement to >=0.5.1
  • Chat model default: Chat TUI and backend app now use the configured QA model instead of hardcoded openai:gpt-4o

0.32.1 - 2026-02-26

Added

  • Automatic title generation: Documents can now have titles auto-generated during ingestion via processing.auto_title: true. Uses two-tier extraction: structural metadata from DoclingDocument (HTML <title>, h1, section headers) first, with LLM fallback via configurable processing.title_model
  • generate_title(): Public method on HaikuRAG to generate a title for an existing document on demand
  • rebuild --title-only: New rebuild mode that generates titles only for untitled documents without re-chunking or re-embedding
  • add --title: CLI option to set a title when adding text documents

0.32.0 - 2026-02-24

Changed

  • RLM sandbox: Replaced Docker-based code execution with pydantic-monty, a minimal secure Python interpreter written in Rust. Eliminates Docker as a runtime dependency for RLM with sub-millisecond sandbox startup
  • RLM sandbox functions: Added get_chunk(chunk_id) for retrieving chunk content and metadata from search results. get_docling_document(document_id) now returns the full document structure as a JSON dict. All sandbox functions now require await
  • RLMConfig: Removed docker_image and docker_memory_limit fields

Added

  • RLM sandbox regex functions: regex_findall, regex_sub, regex_search, regex_split for pattern matching without LLM calls
  • HaikuRAG.get_chunk_by_id(): Public method for chunk lookup by ID

Removed

  • docker_sandbox.py, runner.py: Docker container plumbing replaced by sandbox.py

0.31.1 - 2026-02-20

Fixed

  • info and history commands: Open database in read-only mode to prevent write failures on read-only filesystems

0.31.0 - 2026-02-20

Added

  • RAG skill (haiku.rag.skills.rag): haiku.skills integration with search, list_documents, get_document, ask, and research tools plus managed RAGState
  • RLM skill (haiku.rag.skills.rlm): haiku.skills integration with analyze tool for computational analysis via code execution
  • HaikuRAG.research(): Client method for multi-agent research
  • haiku.skills entry points: rag = "haiku.rag.skills.rag:create_skill", rag-rlm = "haiku.rag.skills.rlm:create_skill"

Changed

  • Chat TUI: Rebuilt on RAG skill + haiku.skills SkillToolset
  • Web app backend: Rebuilt on RAG skill + AGUIAdapter
  • Toolsets simplified: Removed ToolContext, SessionState, AgentDeps, Toolkit; kept core FunctionToolset factories
  • Research graph: Removed session_context and conversational output mode

Removed

  • agents/chat/: Entire chat agent module (replaced by RAG skill)
  • --deep flag: Removed from ask CLI (use research command instead)
  • --context/--context-file: Removed from ask CLI
  • tools/ state machinery: ToolContext, ToolContextCache, SessionState, AgentDeps, Toolkit, etc.

0.30.2 - 2026-02-19

Fixed

  • Added cachetools as an explicit dependency (was only available transitively, causing ModuleNotFoundError for some installations)
  • download-models: Show actionable error message when Ollama is not running instead of cryptic "All connection attempts failed" (#277)

0.30.1 - 2026-02-17

Changed

  • AG-UI state sync: ask tool now emits StateDeltaEvent (JSON Patch) instead of StateSnapshotEvent, consistent with the search tool

0.30.0 - 2026-02-16

Added

  • Composable toolsets: New haiku.rag.tools module with reusable FunctionToolset factories that can be mixed into any pydantic-ai agent
  • create_search_toolset() — hybrid search with context expansion and citation tracking
  • create_document_toolset() — document listing, retrieval, and summarization
  • create_qa_toolset() — question answering via research graph with prior answer recall
  • create_analysis_toolset() — computational analysis via RLM agent (Docker sandbox)
  • Toolkit and build_toolkit(): High-level factory that bundles toolsets, prompt, and context creation for a given feature set. Reduces agent composition from ~15 lines to ~5. build_chat_toolkit() adds chat-specific defaults (background summarization callback)
  • ToolContext: Namespace-based state container shared across toolsets. Toolsets register Pydantic models under string namespaces, enabling state accumulation (search results, citations, QA history) across invocations
  • ToolContextCache: In-memory TTL-based cache for ToolContext instances, keyed by external session/thread ID. Replaces module-level caches for embeddings and summaries
  • run_qa_core(): Extracted core QA function for direct programmatic use without an agent
  • Feature-based chat agent: create_chat_agent() accepts a features list to select which toolsets are enabled (search, documents, qa, analysis). System prompt is composed to match
  • New documentation: docs/tools.md covers all toolsets, ToolContext, state management, filter helpers, and composing custom agents

Changed

  • Toolset factories decoupled from runtime dependencies: create_search_toolset(), create_qa_toolset(), create_document_toolset(), create_analysis_toolset(), and create_chat_agent() no longer take client or context parameters. Instead, tool functions receive these via pydantic-ai's RunContext.deps. This enables toolset and agent creation at configuration time (cacheable, created once), with only lightweight deps created per-request. Deps must satisfy the RAGDeps protocol (client: HaikuRAG, tool_context: ToolContext | None)
  • Toolset factory return types narrowed to FunctionToolset[RAGDeps]: All four toolset factories now declare their return type as FunctionToolset[RAGDeps] instead of bare FunctionToolset
  • create_chat_agent() accepts optional toolkit parameter: Pass a pre-built Toolkit to share toolsets between agent and context creation, avoiding duplicate construction
  • ChatDeps now includes client: ChatDeps(config=..., client=..., tool_context=...) — the client field was added since it's no longer captured by the agent factory
  • prepare_chat_context() helper: Extracted from create_chat_agent() for idempotent namespace registration, since the agent factory no longer has access to the context
  • Chat agent architecture: Rebuilt on composable toolsets instead of monolithic tool definitions. Chat agent is now a thin wrapper around create_search_toolset, create_document_toolset, create_qa_toolset, and create_analysis_toolset
  • State management simplified: Removed session_id, incoming_session_id, and incoming_session_context from the state layer. ToolContextCache preserves all state (embeddings, summaries, QA history) on cached ToolContext instances, eliminating the need for module-level caches
  • AG-UI state sync: ask tool now emits StateSnapshotEvent instead of StateDeltaEvent, ensuring background summarization results are reliably delivered to clients
  • TUI simplified: Chat TUI reads directly from ToolContext namespace states instead of maintaining a separate ChatSessionState and manually syncing via AG-UI state events
  • AG-UI web app: Uses ToolContextCache to maintain per-thread state across requests
  • Frontend session management: Persistent chat sessions with localStorage, wired to backend ToolContextCache via CopilotKit threadId
  • Session manager dropdown: create, switch, delete, and export sessions to markdown
  • Messages, chat state, and citations restored on session switch
  • Session title derived from first user message
  • Inline citation blocks injected after assistant responses via qa_history correlation

Removed

  • SearchAgent: Replaced by create_search_toolset()
  • Module-level session caches: _session_cache, cache_session_context, get_cached_session_context, cache_question_embedding, get_cached_embedding — all replaced by cached ToolContext
  • ChatSessionState from TUI: TUI no longer maintains its own copy of session state

0.29.1 - 2026-02-10

Fixed

  • Document listing memory usage: list_documents no longer loads full document content and docling blobs by default, preventing out-of-memory errors on large databases. Use include_content=True when content is needed.
  • Chat session_id not persisting across AG-UI requests: ChatSessionState.session_id now defaults to "" instead of auto-generating a UUID. This ensures the session_id assignment is detected as a state change and included in the StateDeltaEvent delta, allowing clients to persist it across requests.

0.29.0 - 2026-02-06

Added

  • docling-serve Chunker OCR Options: The docling-serve chunker now respects OCR settings from conversion_options
  • Passes do_ocr, force_ocr, ocr_engine, and ocr_lang to the chunking API
  • Allows disabling OCR via config when running docling-serve in read-only containers
  • RLM Agent (Recursive Language Model): New agent for complex analytical tasks via sandboxed Python code execution
  • Solves problems traditional RAG can't handle: aggregation, computation, multi-document analysis
  • Docker-based sandbox with full Python environment (no import restrictions)
  • Container reuse within a single rlm() call for reduced latency
  • Available functions: search(), list_documents(), get_document(), get_docling_document(), llm()
  • Pre-loaded documents support via documents variable
  • Context filter for scoping searches without LLM control
  • New client.rlm(question) method on HaikuRAG client
  • New haiku-rag rlm CLI command
  • New rlm_question MCP tool
  • New config options: docker_image, docker_memory_limit
  • CI: Docker sandbox integration tests run in GitHub Actions

Fixed

  • CI: Cache HuggingFace tokenizer to prevent flaky test failures when HuggingFace has transient outages

0.28.0 - 2026-01-31

Changed

  • Iterative Research Planning: Research graph now uses an iterative feedback loop instead of batch question processing
  • Planner proposes ONE question at a time, sees the answer, then decides whether to continue
  • Removes gather_context tool — planner proposes questions directly
  • Simpler flow: plan_nextsearch_one → loop back until complete → synthesize
  • Consolidated build_conversational_graph() into build_research_graph(output_mode="conversational")

Removed

  • Dead config options: Removed vestigial fields from iterative planning refactor
  • confidence_threshold from ResearchConfig and ResearchState (LLM decides completion via is_complete)
  • max_sub_questions from QAConfig (iterative flow uses one question at a time)
  • sub_questions field from ResearchContext (no longer populated)

0.27.2 - 2026-01-29

Added

  • Deep Ask Evaluations: QA benchmarks can now use the research graph for multi-step reasoning
  • New --deep flag on evaluations run enables deep ask mode
  • Uses research graph with max_iterations=2 and confidence_threshold=0.0
  • Evaluation name automatically suffixed with _deep when enabled
  • Experiment metadata includes deep_ask field for tracking
  • Chat Agent Document Awareness Tools: Two new tools for browsing and understanding the knowledge base
  • list_documents — Returns DocumentListResponse with paginated documents (50 per page), page number, total pages, and total count; respects session document filter
  • summarize_document — Generate LLM-powered summaries of specific documents
  • Document Count API: New count_documents(filter) method on HaikuRAG client for efficient document counting
  • Read-Only Initial Context: Initial context is now locked after the first message, providing consistent session context
  • Chat TUI: --initial-context CLI option sets background context for the session
  • Context can be edited via command palette before the first message is sent
  • After first message, context becomes read-only (view only)
  • Clearing chat resets context to CLI value and unlocks editing
  • Web app: Memory panel now serves dual purpose - edit initial context before first message, view session context after
  • Agent uses initial_context as fallback when session_context is empty

Changed

  • AG-UI State Delta Updates: Web application now sends StateDeltaEvent (JSON Patch RFC 6902) instead of full StateSnapshotEvent for state updates
  • Reduces bandwidth when state grows large (e.g., 50 Q&As with citations)
  • First request still sends full snapshot; subsequent requests send only changes
  • Backend logging shows incoming/outgoing state events for debugging

Fixed

  • Chat TUI Session State Sync: TUI now syncs full session state from AG-UI events

0.27.1 - 2026-01-27

Added

  • Initial Context for Chat Sessions: New initial_context field on ChatSessionState allows external clients to seed sessions with background context
  • Static context set once at session creation, used as fallback when no cached session context exists
  • Incorporated into first summarization, after which evolved session_context takes precedence
  • Eliminates need for clients to import and call internal cache functions (cache_session_context, get_cached_session_context)
  • session_id now auto-generates a UUID if not provided (previously defaulted to empty string)

Fixed

  • AG-UI StateSnapshotEvent JSON Serialization: Chat agent tools now use model_dump(mode="json") when creating StateSnapshotEvent
  • Fixes TypeError: Object of type datetime is not JSON serializable when external clients persist AG-UI state to database JSON columns

0.27.0 - 2026-01-26

Added

  • Evaluation Database Hosting: Pre-built evaluation databases available on HuggingFace
  • evaluations download <dataset> downloads pre-built databases from ggozad/haiku-rag-eval-dbs
  • evaluations upload <dataset> uploads databases to HuggingFace (maintainer only)
  • Supports all argument to download/upload all datasets at once
  • Use --force flag to overwrite existing databases
  • Avoids lengthy database rebuild times for users running benchmarks
  • Stable Citation Registry: Citation indices now persist across tool calls within a session
  • Same chunk_id always returns the same citation index (first-occurrence-wins)
  • New citation_registry: dict[str, int] field on ChatSessionState
  • New get_or_assign_index(chunk_id) method for stable index assignment
  • Registry serialized/restored via AG-UI state protocol
  • Prior Answer Recall: The ask tool automatically checks conversation history before research
  • Finds semantically similar prior answers using embedding similarity (0.7 cosine threshold)
  • Relevant prior answers are passed to the research planner as context
  • Planner can return empty sub_questions when context is sufficient, avoiding redundant searches
  • Dynamic Session Context: Compressed conversation history for multi-turn chat
  • New SessionContext model stores summarized conversation state instead of raw Q&A history
  • Background LLM-based summarization runs after each ask tool call (non-blocking)
  • Previous summarization tasks are cancelled when new ones start
  • Research graph receives compact context (~1,000-2,000 tokens) instead of raw qa_history (potentially thousands of tokens)
  • New session_context field on ChatSessionState synced via AG-UI state protocol
  • Chat TUI: New context modal (Ctrl+O) to view current session context
  • Session Document Filter: Restrict all search/ask operations to selected documents
  • New document_filter field on ChatSessionState stores list of document titles/URIs
  • Session filter combines with per-tool document_name filter using AND logic
  • Multi-document selection uses OR logic within the session filter
  • Filter persists across tool calls and chat clears via AG-UI state protocol
  • Chat TUI: Access via command palette ("Filter documents" command)
  • Web Application: Filter button in header shows count of selected documents

Changed

  • Dependencies: Updated core dependencies
  • pydantic-ai-slim: 1.44.0 → 1.46.0
  • lancedb: 0.26.1 → 0.27.0
  • docling: 2.68.0 → 2.69.1
  • docling-core: 2.59.0 → 2.60.1
  • VoyageAI Embeddings: Now uses pydantic-ai-slim's native VoyageAI support instead of custom implementation
  • Removed haiku.rag.embeddings.voyageai module
  • The voyageai extra now delegates to pydantic-ai-slim[voyageai]

Removed

  • Q&A History Functions: Removed standalone conversation history utilities
  • rank_qa_history_by_similarity() - similarity matching now integrated into ask tool
  • format_conversation_context() - replaced by SessionContext summarization
  • Associated embedding cache and helper functions also removed

0.26.9 - 2026-01-22

Fixed

  • v0.25.0 Migration Failure: Fixed "Table 'documents' already exists" error during migration caused by held table references preventing drop_table() from succeeding. Added recovery logic to restore documents from staging table if a previous migration attempt failed mid-way.

0.26.8 - 2026-01-22

Added

  • Jina Reranker v3: Added support for Jina reranking with API mode (provider: jina) and local inference (provider: jina-local, requires [jina] extra)
  • Model Downloads: download-models now pre-downloads HuggingFace models for sentence-transformers, mxbai, and jina-local
  • Reranker Factory: Removed unreliable id(config)-based caching from get_reranker(); factory now always instantiates fresh

Changed

  • Agent Search Result Display: Search results now show rank position instead of raw scores
  • SearchResult.format_for_agent() accepts optional rank and total parameters
  • Output changes from (score: 0.02) to [rank 1 of 5] when rank is provided
  • Prevents LLMs from misinterpreting low RRF hybrid search scores as "2% relevant"
  • QA and Research agents updated to pass rank/total to formatted results
  • Agent prompts updated to reference rank-based ordering instead of scores

Fixed

  • Test Cassette Organization: Consolidated all VCR cassettes to tests/cassettes/
  • Environment Loading: Fixed .env file loading to search from current working directory instead of source file directory (#250) - thanks @tianyicui

0.26.7 - 2026-01-20

Added

  • OCR Engine Selection: New ocr_engine option in conversion_options to explicitly select OCR backend (#246)
  • Supported engines: auto (default), easyocr, rapidocr, tesseract, tesserocr, ocrmac
  • Works with both docling-local and docling-serve converters
  • Fixes inconsistent OCR engine selection between docling-serve startup and conversion requests

Removed

  • A2A Example: Removed examples/a2a-server/ A2A protocol server example
  • Stale Example References: Cleaned up references to removed ag-ui-research example from documentation

Changed

  • MCP Error Handling: MCP tools now let exceptions propagate naturally; FastMCP converts them to proper MCP error responses
  • Chunk Contextualization: Consolidated duplicate contextualize logic into Chunk.contextualize_content() method
  • Type Checker: Replaced pyright with ty, Astral's extremely fast Python type checker
  • Added explicit Agent[Deps, Output] type annotations to all pydantic-ai agents for better type inference
  • Removed ~24 unnecessary # type: ignore comments that ty correctly infers
  • Dependencies: Updated to latest versions
  • pydantic-ai-slim: 1.39.0 → 1.44.0
  • docling: 2.67.0 → 2.68.0
  • pathspec: 0.12.1 → 1.0.3
  • textual: 7.0.0 → 7.3.0
  • datasets: 4.4.2 → 4.5.0
  • ruff: 0.14.11 → 0.14.13
  • opencv-python-headless: 4.12.0.88 → 4.13.0.90

Fixed

  • Chat TUI: Fixed crash when logfire is installed but user is not authenticated (#247)

0.26.6 - 2026-01-19

Changed

  • Explicit Database Migrations: Database migrations are no longer applied automatically on open
  • Opening a database with pending migrations now raises MigrationRequiredError with a clear message
  • New haiku-rag migrate command to explicitly apply pending migrations
  • Version-only updates (no schema changes) are applied silently in writable mode
  • New skip_migration_check parameter on Store for tools that need to bypass the check
  • Store.migrate() method returns list of applied migration descriptions

0.26.5 - 2026-01-16

Added

  • Background Context Support: Pass background context to agents via CLI or Python API
  • haiku-rag ask --context "..." --context-file path for Q&A with background context
  • haiku-rag research --context "..." --context-file path for research with background context
  • haiku-rag chat --context "..." --context-file path for chat sessions with persistent context
  • ResearchContext(background_context="...") for Python API usage
  • ChatSessionState(background_context="...") for chat agent sessions
  • Context is included in agent system prompts and research graph planning
  • Frontend Background Context: Settings panel in the chat app to configure persistent background context
  • Context is stored in localStorage and sent with each conversation
  • Frontend Linting: Added Biome for linting and formatting the frontend codebase

0.26.4 - 2026-01-15

Added

  • AGUI_STATE_KEY Constant: Exported AGUI_STATE_KEY ("haiku.rag.chat") from haiku.rag.agents.chat for namespaced AG-UI state emission
  • Enables integrators to use a consistent key when combining haiku.rag with other agents
  • Backend, TUI, and frontend now use this key for state emission and extraction

0.26.3 - 2026-01-15

Added

  • Enhanced Database Info: haiku-rag info now displays pydantic-ai version and docling-document schema version
  • Keyed State Emission for Chat Agent: New state_key parameter in ChatDeps for namespaced AG-UI state snapshots
  • When set, tools emit {state_key: snapshot} instead of bare state, enabling state merging when multiple agents share state
  • Default None preserves backwards compatibility (bare state emission)
  • Page Image Generation Control: New generate_page_images option in ConversionOptions to control PDF page image extraction

Changed

  • CLI Error Handling: Commands (rebuild, vacuum, create-index, ask, research) now propagate errors with proper exit codes instead of swallowing exceptions

Fixed

  • Embed-only rebuild with changed vector dimensions: Fixed haiku-rag rebuild --embed-only failing when the configured embedding model has different dimensions than the database
  • Store now reads stored vector dimension when opening existing databases, allowing chunks to be read regardless of current config
  • _rebuild_embed_only recreates the chunks table to handle dimension changes
  • generate_page_images: bool = True - Enable/disable rendered page images (used by visualize_chunk())
  • Works with both docling-local and docling-serve converters
  • For docling-serve, maps to image_export_mode API parameter (embedded/placeholder)
  • Note: generate_picture_images (embedded figures/diagrams) works with local converter but has limited support in docling-serve

0.26.2 - 2026-01-13

Changed

  • Dependencies: Updated docling dependencies for latest docling-serve compatibility (#229)
  • docling-core: 2.57.0 → 2.59.0 (supports schema 1.9.0)
  • docling: 2.65.0 → 2.67.0

0.26.1 - 2026-01-13

Fixed

  • Docling Schema Version Mismatch: Fixed incompatibility between docling and docling-core causing ValidationError: Doc version 1.9.0 incompatible with SDK schema version 1.8.0 when adding documents (#229)
  • Root cause: docling-core was reverted to 2.57.0 (schema 1.8.0) for docling-serve compatibility, but docling remained at 2.67.0 (schema 1.9.0)
  • Fix: Reverted docling from 2.67.0 to 2.65.0 to match docling-core schema version

0.26.0 - 2026-01-13

Added

  • Conversational RAG Application: Full-stack application (app/) with CopilotKit frontend and pydantic-ai AG-UI backend
  • Next.js frontend with chat interface, citation display, and visual grounding
  • Starlette backend using pydantic-ai's native AGUIAdapter for streaming
  • Docker Compose setup for development (docker-compose.dev.yml) and production
  • Logfire integration for debugging LLM calls
  • SSE heartbeat to prevent connection timeouts
  • Chat Agent (haiku.rag.agents.chat): New conversational RAG agent optimized for multi-turn chat
  • create_chat_agent() factory function for creating chat agents with AG-UI support
  • SearchAgent for internal query expansion with deduplication
  • ChatDeps and ChatSessionState for session management
  • CitationInfo and QAResponse models for structured responses
  • Natural language document filtering via build_document_filter()
  • Configurable search limit per agent
  • Chat TUI (haiku-rag chat): Terminal-based chat interface using Textual
  • Single chat window with inline tool calls and expandable citations
  • Visual grounding (v key) reuses inspector's VisualGroundingModal
  • Database info (i key) shows document/chunk counts and storage info
  • Keybindings: q quit, Ctrl+L clear chat, Escape focus input
  • Q/A History Management: Intelligent conversation history with semantic ranking
  • FIFO queue with 50 max entries
  • Embedding cache to avoid re-embedding Q/A pairs
  • rank_qa_history_by_similarity() returns top-K most relevant history entries
  • Confidence filtering to exclude low-confidence answers from context
  • Conversational Research Graph: Simplified single-iteration research graph for chat
  • build_conversational_graph() optimized for conversational Q&A
  • Context-aware planning (generates fewer sub-questions when history exists)
  • ConversationalAnswer output type with direct answer and citations

Changed

  • BREAKING: Module Reorganization: Consolidated all agent code under haiku.rag.agents
  • Moved haiku.rag.qahaiku.rag.agents.qa
  • Moved haiku.rag.graph.researchhaiku.rag.agents.research
  • Added haiku.rag.agents.chat module with conversational RAG agent
  • Deleted haiku.rag.graph module (research graph now at haiku.rag.agents.research.graph)

Removed

  • BREAKING: Custom AG-UI Infrastructure: Removed custom AG-UI event handling in favor of pydantic-ai's native AG-UI support
  • Deleted haiku.rag.graph.agui module (AGUIEmitter, AGUIConsoleRenderer, stream_graph(), create_agui_server())
  • Removed --agui flag from serve command
  • Removed --verbose flags from ask and research commands
  • Removed --interactive flag from research command
  • Removed AGUIConfig from configuration
  • Deleted cli_chat.py interactive chat module
  • Research graph now uses graph.run() directly instead of stream_graph()
  • For AG-UI streaming, use pydantic-ai's native AGUIAdapter with ToolReturn and StateSnapshotEvent (see app/backend/ for example)
  • AG-UI Research Example: Removed examples/ag-ui-research/ (replaced by app/)

0.25.0 - 2026-01-12

Fixed

  • Large Document Storage Overflow: Fixed "byte array offset overflow" panic when vacuuming/rebuilding databases with many large PDF documents (#225)
  • Root cause: Arrow's 32-bit string column offsets limited to ~2GB per fragment
  • Changed docling_document_json (string) to docling_document (bytes) with large_binary Arrow type (64-bit offsets)
  • Added gzip compression for DoclingDocument JSON (~1.4x compression ratio)
  • Migration automatically compresses existing documents in batches to avoid memory issues
  • Breaking: Migration is destructive - all table version history is lost after upgrade

Changed

  • Dependencies: Updated lancedb 0.26.0 → 0.26.1, docling 2.65.0 → 2.67.0

Removed

  • Legacy Migrations: Removed obsolete database migration files (v0_9_3.py, v0_10_1.py, v0_19_6.py). These migrations were for versions prior to 0.20.0 and are no longer needed since the current release requires a database rebuild anyway.

0.24.2 - 2026-01-08

Fixed

  • Base64 Images in Expanded Context: Fixed base64 image data leaking into expanded search results when expand_context() processed PictureItem objects. The issue was PictureItem.export_to_markdown() defaulting to EMBEDDED mode. Now explicitly uses PLACEHOLDER mode to prevent base64 data while still including VLM descriptions and captions.

0.24.1 - 2026-01-08

Fixed

  • OpenAI Non-Reasoning Models: Fixed reasoning_effort parameter being sent to non-reasoning OpenAI models (gpt-4o, gpt-4o-mini), causing 400 errors. Now correctly detects reasoning models (o1, o3 series) using pydantic-ai's model profile.
  • Bedrock Non-Reasoning Models: Fixed same issue for OpenAI models on Bedrock.

0.24.0 - 2026-01-07

Added

  • VLM Picture Description: Describe embedded images using Vision Language Models during document conversion
  • Images are sent to a VLM for automatic description via OpenAI-compatible API
  • Descriptions become searchable text, improving RAG retrieval for visual content
  • Configure via processing.conversion_options.picture_description with enabled, model, timeout, max_tokens
  • Default prompt customizable via prompts.picture_description
  • Requires OpenAI-compatible /v1/chat/completions endpoint (Ollama, OpenAI, vLLM, LM Studio)

0.23.2 - 2026-01-05

Fixed

  • AG-UI Concurrent Step Tracking: Emitter now correctly tracks multiple concurrent steps (#216)

Changed

  • Dependencies: Updated core and development dependencies

0.23.1 - 2025-12-29

Added

  • Contextualized FTS Search: Full-text search now includes section headings
  • New content_fts column stores contextualized content (headings + body text)
  • FTS index now searches content_fts for better keyword matching on section context
  • Original content column preserved for display and context expansion
  • Migration automatically populates content_fts for existing databases
  • GitHub Actions CI: Test workflow runs pytest, pyright, and ruff on push/PR to main
  • VCR Cassette Recording: Integration tests use recorded HTTP responses for deterministic CI runs
  • LLM tests (QA, embeddings, research graph) replay from cassettes without real API calls
  • docling-serve tests run without Docker container in CI
  • Uses pytest-recording with custom JSON body serializer

0.23.0 - 2025-12-26

Added

  • Prompt Customization: Configure agent prompts via prompts config section
  • domain_preamble: Prepended to all agent prompts for domain context
  • qa: Full replacement for QA agent prompt
  • synthesis: Full replacement for research synthesis prompt

Changed

  • Embeddings: Migrated to pydantic-ai's embeddings module
  • Uses pydantic-ai v1.39.0+ embeddings with instrumentation and token counting support
  • Explicit embed_query() and embed_documents() API for query/document distinction
  • New providers available: Cohere (cohere:), SentenceTransformers (sentence-transformers:)
  • VoyageAI refactored to extend pydantic-ai's EmbeddingModel base class
  • Configuration: Added base_url to ModelConfig and EmbeddingModelConfig
  • Enables custom endpoints for OpenAI-compatible providers (vLLM, LM Studio, etc.)
  • Model-level base_url takes precedence over provider config

Deprecated

  • vLLM and LM Studio providers: Use openai provider with base_url instead
  • provider: vllmprovider: openai with base_url: http://localhost:8000/v1
  • provider: lm_studioprovider: openai with base_url: http://localhost:1234/v1

Removed

  • Deleted obsolete embedder implementations: ollama.py, openai.py, vllm.py, lm_studio.py, base.py
  • Removed VLLMConfig and LMStudioConfig from configuration (use base_url in model config instead)

0.22.0 - 2025-12-19

Added

  • Read-Only Mode: Global --read-only CLI flag for safe database access without modifications
  • Blocks all write operations at the Store layer
  • Skips database upgrades and settings saves on open
  • Excludes write tools (add_document_*, delete_document) from MCP server
  • Disables file monitor with warning when --read-only is used with serve --monitor
  • Time Travel: Query the database as it existed at a previous point in time
  • Global --before CLI flag accepts datetime strings (ISO 8601 or date-only)
  • Automatically enables read-only mode when time-traveling
  • New history command shows version history for database tables
  • Useful for debugging and auditing
  • Supported throughout: CLI, Client, App, Inspector

Fixed

  • File Monitor Path Validation: Monitor now validates directories exist before watching (#204)
  • Provides clear error message pointing to haiku.rag.yaml configuration
  • Prevents cryptic FileNotFoundError: No path was found from watchfiles
  • Docker Documentation: Improved Docker setup instructions
  • Added volume mount examples for config file and documents directory
  • Clarified that monitor.directories must use container paths, not host paths

Changed

  • Dependencies: Updated core dependencies
  • pydantic-ai-slim: 1.27.0 → 1.36.0 (FileSearchTool, web chat UI, GPT-5.2 support, prompt caching)
  • lancedb: 0.25.3 → 0.26.0
  • docling: 2.64.0 → 2.65.0
  • docling-core: 2.54.0 → 2.57.0

0.21.0 - 2025-12-18

Added

  • Interactive Research Mode: Human-in-the-loop research using graph-based decision nodes
  • haiku-rag research --interactive starts conversational CLI chat
  • Natural language interpretation for user commands (search, modify questions, synthesize)
  • Chat with assistant before starting research, and during decision points
  • Review collected answers and pending questions at each decision point
  • Add, remove, or modify sub-questions through natural conversation
  • New human_decide graph node emits AG-UI tool calls (TOOL_CALL_START/ARGS/END) for frontend integration
  • New emit_tool_call_start(), emit_tool_call_args(), emit_tool_call_end() AG-UI event helpers
  • New AGUIEmitter.emit() method for direct event emission
  • AG-UI Research Example: Human-in-the-loop research with client-side tool calling
  • Frontend handles human_decision tool calls via AG-UI TOOL_CALL_* events
  • Tool results sent directly to backend /v1/research/stream endpoint
  • Backend queues decisions and continues the research graph
  • HotpotQA Evaluation: Added HotpotQA dataset adapter for multi-hop QA benchmarks
  • Extracts unique documents from validation set context paragraphs
  • Uses MAP for retrieval evaluation (multiple supporting documents per question)
  • Run with evaluations hotpotqa
  • Plain Text Format: Added format="plain" for text conversion
  • Use when content is plain text without markdown/HTML structure
  • Falls back gracefully when docling cannot detect markdown format in content
  • Supported in create_document(), convert(), and all converter classes

Changed

  • AG-UI Events: Replaced custom event classes with ag_ui.core types
  • Removed haiku.rag.graph.agui.events module
  • Event factory functions (emit_*) now wrap official ag_ui.core event classes
  • Chunker Sets Order: Chunkers now set chunk.order directly
  • Unified Research Graph: Simplified and unified research and deep QA into a single configurable graph
  • Removed analyze_insights node - graph now flows directly from collect_answers to decide
  • Simplified EvaluationResult to: is_sufficient, confidence_score, reasoning, new_questions
  • Simplified ResearchContext - removed insight/gap tracking methods
  • ask --deep now uses research graph with max_iterations=2, confidence_threshold=0.0
  • ask --deep output now shows executive summary, key findings, and sources
  • Added include_plan parameter to build_research_graph() for plan-less execution
  • Added max_iterations and confidence_threshold overrides to ResearchState.from_config()
  • Improved Synthesis Prompt: Updated synthesis agent prompt to produce direct answers
  • Executive summary now directly answers the question instead of describing the report
  • Added explicit examples of good vs bad output style
  • Evaluations Vacuum Strategy: populate_db now uses periodic vacuum to prevent disk exhaustion with large datasets
  • Disables auto_vacuum during population, vacuums every N documents with retention=0
  • New --vacuum-interval CLI option (default: 100) to control vacuum frequency
  • Prevents disk space issues when building databases with thousands of documents (e.g., HotpotQA)
  • Benchmarks Documentation: Restructured benchmarks.md for clarity
  • Added dedicated Methodology section explaining MRR, MAP, and QA Accuracy metrics
  • Organized results by dataset with retrieval and QA subsections

Removed

  • Deep QA Graph: Removed haiku.rag.graph.deep_qa module entirely
  • Use build_research_graph() with appropriate parameters instead
  • ask --deep CLI command now uses research graph internally
  • Insight/Gap Tracking: Removed over-engineered insight and gap tracking from research graph
  • Removed InsightRecord, GapRecord, InsightAnalysis, InsightStatus, GapSeverity models
  • Removed format_analysis_for_prompt() helper
  • Removed INSIGHT_AGENT_PROMPT from prompts

0.20.2 - 2025-12-12

Fixed

  • LLM Schema Compliance: Improved prompts to prevent LLMs from returning objects instead of plain strings for list[str] fields
  • All graph prompts now explicitly state that list fields must contain plain strings only
  • Added missing query and confidence fields to search agent output format documentation
  • Fixes validation errors with less capable models that ignore JSON schema constraints
  • AG-UI Frontend Types: Fixed TypeScript interfaces in ag-ui-research example to match backend Python models
  • EvaluationResult: confidenceconfidence_score, should_continueis_sufficient, gaps_identifiedgaps, follow_up_questionsnew_questions, added key_insights
  • ResearchReport: questiontitle, summaryexecutive_summary, findingsmain_findings, removed insights_used/methodology, added limitations/recommendations/sources_summary
  • Updated Final Report UI to display new fields (Recommendations, Limitations, Sources)
  • Citation Formatting: Citations in CLI now render properly with Rich panels
  • Content is rendered as markdown with proper code block formatting
  • No longer truncates or flattens newlines in citation content

0.20.1 - 2025-12-11

Added

  • Search Filter for Graphs: Research and Deep QA graphs now support search_filter parameter to restrict searches to specific documents
  • Set state.search_filter to a SQL WHERE clause (e.g., "id IN ('doc1', 'doc2')") before running the graph
  • Enables document-scoped research workflows
  • CLI: haiku-rag research "question" --filter "uri LIKE '%paper%'"
  • CLI: haiku-rag ask "question" --filter "title = 'My Doc'"
  • Python: client.ask(question, filter="...") and agent.answer(question, filter="...")
  • AG-UI Research Example: Added bidirectional state demonstration with document filter
  • New /api/documents endpoint to list available documents
  • Frontend document selector component with search and multi-select
  • Demonstrates client-to-server state flow via AG-UI protocol
  • Inspector Info Modal: New i keyboard shortcut opens a modal displaying database information

Changed

  • Inspector Lazy Loading: Chunks panel now loads chunks in batches of 50 with infinite scroll
  • Fixes unresponsive UI when viewing documents with large numbers of chunks
  • New ChunkRepository.get_by_document_id() pagination with limit and offset parameters
  • New ChunkRepository.count_by_document_id() method

0.20.0 - 2025-12-10

Added

  • DoclingDocument Storage: Full DoclingDocument JSON is now stored with each document, enabling rich context and visual grounding
  • Documents store the complete DoclingDocument structure (JSON) and schema version
  • Chunks store metadata with JSON pointer references (doc_item_refs), semantic labels, section headings, and page numbers
  • New ChunkMetadata model for structured chunk provenance: doc_item_refs, headings, labels, page_numbers
  • Document.get_docling_document() method to parse stored DoclingDocument
  • ChunkMetadata.resolve_doc_items() to resolve JSON pointer refs to actual DocItem objects
  • ChunkMetadata.resolve_bounding_boxes() for visual grounding with page coordinates
  • LRU cache (100 documents) for parsed DoclingDocument objects to avoid repeated JSON parsing
  • Enhanced Search Results: search() and expand_context() now return full provenance information
  • SearchResult includes page_numbers, headings, labels, and doc_item_refs
  • QA and research agents use provenance for better citations (page numbers, section headings)
  • Type-Aware Context Expansion: expand_context() now uses document structure for intelligent expansion
  • Structural content (tables, code blocks, lists) expands to complete structures regardless of chunking
  • Text content uses radius-based expansion via text_context_radius setting
  • max_context_items and max_context_chars settings control expansion limits
  • SearchResult.format_for_agent() method formats expanded results with metadata for LLM consumption
  • Visual Grounding: View page images with highlighted bounding boxes for chunks
  • Inspector modal with keyboard navigation between pages
  • CLI command: haiku-rag visualize <chunk_id>
  • Requires textual-image dependency and terminal with image support
  • Processing Primitives: New methods for custom document processing pipelines
  • convert() - Convert files, URLs, or text to DoclingDocument
  • chunk() - Chunk a DoclingDocument into Chunk objects
  • contextualize() - Prepend section headings to chunk content for embedding
  • embed_chunks() - Generate embeddings for chunks
  • New import_document() Method: Import pre-processed documents with custom chunks
  • Accepts DoclingDocument directly for rich metadata (visual grounding, page numbers)
  • Use when document conversion, chunking, or embedding were done externally
  • Chunks without embeddings are automatically embedded
  • Automatic Chunk Embedding: import_document() and update_document() automatically embed chunks that don't have embeddings
  • Pass chunks with or without embeddings - missing embeddings are generated
  • Chunks with pre-computed embeddings are stored as-is
  • Format Parameter for Text Conversion: New format parameter for convert() and create_document() to specify content type
  • Supports "md" (default) for markdown and "html" for HTML content
  • HTML format preserves document structure (headings, lists, sections) in DoclingDocument
  • Enables proper parsing of HTML content that was previously treated as plain text
  • Inspector Context Modal: Press c in the inspector to view expanded context for the selected chunk
  • Auto-Vacuum Configuration: New storage.auto_vacuum setting to control automatic vacuuming behavior
  • When true (default), vacuum runs automatically after document create/update operations and rebuilds
  • When false, vacuum only runs via explicit haiku-rag vacuum command
  • Disabling can help avoid potential crashes in high-concurrency scenarios due to LanceDB race conditions

Changed

  • BREAKING: create_document() API: Removed chunks parameter
  • create_document() now always processes content (converts, chunks, embeds)
  • Use import_document() for pre-processed documents with custom chunks
  • BREAKING: update_document() API: Unified with update_document_fields()
  • Old: update_document(document) - pass modified Document object
  • New: update_document(document_id, content=, metadata=, chunks=, title=, docling_document=)
  • content and docling_document are mutually exclusive
  • BREAKING: Chunker Interface: DocumentChunker.chunk() now returns list[Chunk] instead of list[str]
  • Chunks include structured metadata (doc_item_refs, labels, headings, page_numbers)
  • Search Config: New settings in search section for search behavior and context expansion
  • search.limit - Default number of search results (default: 5). Used by CLI, MCP server, and API when no limit specified
  • search.context_radius - DocItems before/after to include for text content expansion (default: 0)
  • search.max_context_items - Maximum items in expanded context (default: 10)
  • search.max_context_chars - Maximum characters in expanded context (default: 10000)
  • Rebuild Performance: Batched database writes during rebuild command reduce LanceDB versions by ~98%
  • All rebuild modes (FULL, RECHUNK, EMBED_ONLY) now batch writes across documents
  • Eliminates redundant per-document chunk deletions and vacuum calls
  • Significantly reduces storage overhead and improves rebuild speed for large databases
  • Embedding Architecture: Moved embedding generation from ChunkRepository to client layer
  • Repository is now a pure persistence layer
  • Client handles embedding via _ensure_chunks_embedded()
  • Chunk Text Storage: Chunks store raw text; headings prepended only at embedding time
  • Stored chunk content stays clean without duplicate heading prefixes
  • Local and serve chunkers now produce identical output
  • Citation Models: Introduced RawSearchAnswer for LLM output, SearchAnswer with resolved citations
  • Page Image Generation: Always enabled for local docling converter (required for visual grounding)
  • Download Models Progress: haiku-rag download-models now shows real-time progress with Rich progress bars for Ollama model downloads

Removed

  • BREAKING: markdown_preprocessor Config Option: Use processing primitives (convert(), chunk(), embed_chunks()) for custom pipelines
  • update_document_fields(): Merged into update_document()

Migration

This release requires a database rebuild to populate the new DoclingDocument fields:

haiku-rag rebuild

Existing documents without DoclingDocument data will work but won't have provenance information.

0.19.6 - 2025-12-03

Changed

  • BREAKING: Explicit Database Creation: Databases must now be explicitly created before use
  • New haiku-rag init command creates a new empty database
  • Python API: HaikuRAG(path, create=True) to create database programmatically
  • Operations on non-existent databases raise FileNotFoundError
  • BREAKING: Embeddings Configuration: Restructured to nested EmbeddingModelConfig
  • Config path changed from embeddings.{provider, model, vector_dim} to embeddings.model.{provider, name, vector_dim}
  • Automatic migration upgrades existing databases to new format
  • Database Migrations: Always run when opening an existing database

0.19.5 - 2025-12-01

Changed

  • Rebuild Performance: Optimized rebuild --embed-only to use batch updates via LanceDB's merge_insert instead of individual chunk updates, and skip chunks with unchanged embeddings

0.19.4 - 2025-11-28

Added

  • Rebuild Modes: New options for rebuild command to control what gets rebuilt
  • --embed-only: Only regenerate embeddings, keeping existing chunks (fastest option when changing embedding model)
  • --rechunk: Re-chunk from existing document content without accessing source files
  • Default (no flag): Full rebuild with source file re-conversion
  • Python API: rebuild_database(mode=RebuildMode.EMBED_ONLY | RECHUNK | FULL)

0.19.3 - 2025-11-27

Changed

  • Async Chunker: DoclingServeChunker now uses httpx.AsyncClient instead of sync requests

Fixed

  • OCR Options: Fixed DoclingLocalConverter using base OcrOptions class which docling's OCR factory doesn't recognize. Now uses OcrAutoOptions for automatic OCR engine selection.
  • Dependencies: Added opencv-python-headless to the docling optional dependency for table structure detection.

0.19.2 - 2025-11-27

Changed

  • Async Converters: Made document converters fully async
  • BaseConverter.convert_file() and convert_text() are now async methods
  • DoclingLocalConverter wraps blocking Docling operations with asyncio.to_thread()
  • DoclingServeConverter now uses httpx.AsyncClient instead of sync requests
  • Async Model Prefetch: prefetch_models() is now async
  • Uses httpx.AsyncClient for Ollama model pulls
  • Wraps blocking Docling and HuggingFace downloads with asyncio.to_thread()

0.19.1 - 2025-11-26

Added

  • LM Studio Provider: Added support for LM Studio as a provider for embeddings and QA/research models
  • Configure with provider: lm_studio in embeddings, QA, or research model settings
  • Supports thinking control for reasoning models (gpt-oss, etc.)
  • Default base URL: http://localhost:1234

Fixed

  • Configuration: Fixed init-config command generating invalid configuration files (#165)
  • Refactored generate_default_config() to use Pydantic model serialization instead of manual dict construction
  • Updated qa, research, and reranking sections to use new ModelConfig structure

0.19.0 - 2025-11-25

Added

  • Model Customization: Added support for per-model configuration settings
  • New enable_thinking parameter to control reasoning behavior (true/false/None)
  • Support for temperature and max_tokens settings on QA and research models
  • All settings apply to any provider that supports them
  • Database Inspector: New inspect CLI command launches interactive TUI for browsing documents and chunks & searching
  • Evaluations: Added evaluations CLI script for running benchmarks (replaces python -m evaluations.benchmark)
  • Evaluations: Added --db option to override evaluation database path
  • Default database location moved to haiku.rag data directory:
    • macOS: ~/Library/Application Support/haiku.rag/evaluations/dbs/
    • Linux: ~/.local/share/haiku.rag/evaluations/dbs/
    • Windows: C:/Users/<USER>/AppData/Roaming/haiku.rag/evaluations/dbs/
  • Previously stored in evaluations/data/ within the repository
  • Evaluations: Added comprehensive experiment metadata tracking for better reproducibility
  • Records dataset name, test case count, and all model configurations
  • Tracks embedder settings: provider, model, and vector dimensions
  • Tracks QA model: provider and model name
  • Tracks judge model: provider and model name for LLM evaluation
  • Tracks processing parameters: chunk_size and context_chunk_radius
  • Tracks retrieval configuration: retrieval_limit for number of chunks retrieved
  • Tracks reranking configuration: rerank_provider and rerank_model
  • Enables comparison of evaluation runs with different configurations in Logfire
  • Evaluations: Refactored retrieval evaluation to use pydantic-ai experiment framework
  • New evaluators module with MRREvaluator (Mean Reciprocal Rank) and MAPEvaluator (Mean Average Precision)
  • Retrieval benchmarks now use Dataset.evaluate() with full Logfire experiment tracking
  • Dataset specifications now declare their retrieval evaluator (MRR for RepliQA, MAP for Wix)
  • Replaced Recall@K and Success@K with industry-standard MRR and MAP metrics
  • Unified evaluation framework for both retrieval and QA benchmarks
  • AG-UI Events: Enhanced ActivitySnapshot events with richer structured data
  • Added stepName field to identify which graph node emitted each activity
  • Added structured fields to activity content while preserving backward-compatible message field:
    • Planning: sub_questions - list of sub-question strings
    • Searching: query - the search query, confidence - answer confidence (on success), error - error message (on failure)
    • Analyzing (research): insights - list of insight objects, gaps - list of gap objects, resolved_gaps - list of resolved gap strings
    • Evaluating (research): confidence - confidence score, is_sufficient - sufficiency flag
    • Evaluating (deep QA): is_sufficient - sufficiency flag, iterations - iteration count

Changed

  • Evaluations: Renamed --qa-limit CLI parameter to --limit, now applies to both retrieval and QA benchmarks
  • Evaluations: Retrieval evaluator selection moved from runtime logic to dataset configuration

0.18.0 - 2025-11-21

Added

  • Manual Vector Indexing: New create-index CLI command for explicit vector index creation
  • Creates IVF_PQ indexes
  • Requires minimum 256 chunks (LanceDB training data requirement)
  • New search.vector_index_metric config option: cosine (default), l2, or dot
  • New search.vector_refine_factor config option (default: 30) for accuracy/speed tradeoff
  • Indexes not created automatically during ingestion to avoid performance degradation
  • Manual rebuilding required after adding significant new data
  • Enhanced Info Command: haiku-rag info now shows storage sizes and vector index statistics
  • Displays storage size for documents and chunks tables in human-readable format
  • Shows vector index status (exists/not created)
  • Shows indexed and unindexed chunk counts for monitoring index staleness

Changed

  • BREAKING: Default Embedding Model: Changed default embedding model from qwen3-embedding to qwen3-embedding:4b with vector dimension 2560 (previously 4096)
  • New installations will use the smaller, more efficient 4B parameter model by default
  • Action required: Existing databases created with the old default will be incompatible. Users must either:
    • Explicitly set embeddings.model: "qwen3-embedding" and embeddings.vector_dim: 4096 in their config to maintain compatibility with existing databases
    • Or run haiku-rag rebuild to re-embed all documents with the new default
  • This change provides better performance for most use cases while reducing resource requirements
  • Evaluations: Improved evaluation dataset naming and simplified evaluator configuration
  • EvalDataset now accepts dataset name for better organization in Logfire
  • Added --name CLI parameter to override evaluation run names
  • Removed IsInstance evaluator, using only LLMJudge for QA evaluation
  • Search Accuracy: Applied refine_factor to vector and hybrid searches for improved accuracy
  • Retrieves refine_factor * limit candidates and re-ranks in memory
  • Higher values increase accuracy but slow down queries

Fixed

  • AG-UI Activity Events: Activity events now correctly use structured dict content instead of strings
  • Graph Configuration: Graph builder functions now properly accept and use non-global config (#149)
  • build_research_graph() and build_deep_qa_graph() now pass config to all agents and model creation
  • get_model() utility function accepts config parameter (defaults to global Config)
  • Allows creating multiple graphs with different configurations in the same application

0.17.2 - 2025-11-19

Added

  • Document Update API: New update_document_fields() method for partial document updates
  • Update individual fields (content, metadata, title, chunks) without fetching full document
  • Support for custom chunks or auto-generation from content

Changed

  • Chunk Creation: ChunkRepository.create() now accepts both single chunks and lists for batch insertion
  • Batch insertion reduces LanceDB version creation when adding multiple chunks with custom chunks
  • Batch embedding generation for improved performance with multiple chunks
  • Updated core dependencies

0.17.1 - 2025-11-18

Added

  • Conversion Options: Fine-grained control over document conversion for both local and remote converters
  • New conversion_options config section in ProcessingConfig
  • OCR settings: do_ocr, force_ocr, ocr_lang for controlling OCR behavior
  • Table extraction: do_table_structure, table_mode (fast/accurate), table_cell_matching
  • Image settings: images_scale to control image resolution
  • Options work identically with both docling-local and docling-serve converters

Changed

  • Increase reranking candidate retrieval multiplier from 3x to 10x for improved result quality
  • Docker Images: Main haiku.rag image no longer automatically built and published
  • Conversion Options: Removed the legacy pdf_backend setting; docling now chooses the optimal backend automatically

0.17.0 - 2025-11-17

Added

  • Remote Processing: Support for docling-serve as remote document processing and chunking service
  • New converter config option: docling-local (default) or docling-serve
  • New chunker config option: docling-local (default) or docling-serve
  • New providers.docling_serve config section with base_url, api_key, and timeout
  • Comprehensive error handling for connection, timeout, and authentication issues
  • Chunking Strategies: Support for both hybrid and hierarchical chunking
  • New chunker_type config option: hybrid (default) or hierarchical
  • Hybrid chunking: Structure-aware splitting that respects document boundaries
  • Hierarchical chunking: Preserves document hierarchy for nested documents
  • Table Serialization Control: Configurable table representation in chunks
  • New chunking_use_markdown_tables config option (default: false)
  • false: Tables serialized as narrative text ("Value A, Column 2 = Value B")
  • true: Tables preserved as markdown format with structure
  • Chunking Configuration: Additional chunking control options
  • New chunking_merge_peers config option (default: true) to merge undersized successive chunks
  • Docker Images: Two Docker images for different deployment scenarios
  • haiku.rag: Full image with all dependencies for self-contained deployments
  • haiku.rag-slim: Minimal image designed for use with external docling-serve
  • Multi-platform support (linux/amd64, linux/arm64)
  • Docker Compose examples with docling-serve integration
  • Automated CI/CD workflows for both images
  • Build script (scripts/build-docker-images.sh) for local multi-platform builds

Changed

  • BREAKING: Chunking Tokenizer: Switched from tiktoken to HuggingFace tokenizers for consistency with docling-serve
  • Default tokenizer changed from tiktoken "gpt-4o" to "Qwen/Qwen3-Embedding-0.6B"
  • New chunking_tokenizer config option in ProcessingConfig for customization
  • download-models CLI command now also downloads the configured HuggingFace tokenizer
  • Docker Examples: Updated examples to demonstrate remote processing
  • examples/docker now uses slim image with docling-serve
  • examples/ag-ui-research backend uses slim image with docling-serve
  • Configuration examples include remote processing setup

0.16.1 - 2025-11-14

Changed

  • Evaluations: Refactored QA benchmark to run entire dataset as single evaluation for better Logfire experiment tracking
  • Evaluations: Added .env file loading support via python-dotenv dependency

0.16.0 - 2025-11-13

Added

  • AG-UI Protocol Support: Full AG-UI (Agent-UI) protocol implementation for graph execution with event streaming
  • New AGUIEmitter class for emitting AG-UI events from graphs
  • Support for all AG-UI event types: lifecycle events (RUN_STARTED, RUN_FINISHED, RUN_ERROR), step events (STEP_STARTED, STEP_FINISHED), state updates (STATE_SNAPSHOT, STATE_DELTA), activity narration (ACTIVITY_SNAPSHOT), and text messages (TEXT_MESSAGE_CHUNK)
  • AGUIConsoleRenderer for rendering AG-UI event streams to terminal with Rich formatting
  • stream_graph() utility function for executing graphs with AG-UI event emission
  • State diff computation for efficient state synchronization
  • Delta State Updates: AG-UI emitter now supports incremental state updates via JSON Patch operations (STATE_DELTA events) to reduce bandwidth, configurable via use_deltas parameter (enabled by default)
  • AG-UI Server: Starlette-based HTTP server for serving graphs via AG-UI protocol
  • Server-Sent Events (SSE) streaming endpoint at /v1/agent/stream
  • Health check endpoint at /health
  • Full CORS support configurable via agui config section
  • create_agui_server() function for programmatic server creation
  • Deep QA AG-UI Support: Deep QA graph now fully supports AG-UI event streaming
  • Integration with AGUIEmitter for progress tracking
  • Step-by-step execution visibility via AG-UI events
  • CLI AG-UI Flag: New --agui flag for serve command to start AG-UI server
  • Graph Module: New unified haiku.rag.graph module containing all graph-related functionality
  • Common Graph Nodes: New factory functions (create_plan_node, create_search_node) in haiku.rag.graph.common.nodes for reusable graph components
  • AG-UI Research Example: New full-stack example (examples/ag-ui-research) demonstrating agent+graph architecture with CopilotKit frontend
  • Pydantic AI agent with research tool that invokes the research graph
  • Custom AG-UI streaming endpoint with anyio memory streams
  • React/Next.js frontend with split-pane UI showing live research state
  • Real-time progress tracking of questions, answers, insights, and gaps
  • Docker Compose setup for easy local development

Changed

  • Vacuum Retention: Default vacuum_retention_seconds increased from 60 seconds to 86400 seconds (1 day) for better version retention in typical workflows
  • BREAKING: Major refactoring of graph-related code into unified haiku.rag.graph module structure:
  • haiku.rag.researchhaiku.rag.graph.research
  • haiku.rag.qa.deephaiku.rag.graph.deep_qa
  • haiku.rag.aguihaiku.rag.graph.agui
  • haiku.rag.graph_commonhaiku.rag.graph.common
  • BREAKING: Research and Deep QA graphs now use AG-UI event protocol instead of direct console logging
  • Removed console and stream parameters from graph dependencies
  • All progress updates now emit through AGUIEmitter
  • BREAKING: ResearchState converted from dataclass to Pydantic BaseModel for JSON serialization and AG-UI compatibility
  • Research and Deep QA graphs now emit detailed execution events for better observability
  • CLI research command now uses AG-UI event rendering for --verbose output
  • Improved graph execution visibility with step-by-step progress tracking
  • Updated all documentation to reflect new import paths and AG-UI usage
  • Updated examples (ag-ui-research, a2a-server) to use new import paths

Fixed

  • Document Creation: Optimized create_document to skip unnecessary DoclingDocument conversion when chunks are pre-provided
  • FileReader: Error messages now include both original exception details and file path for easier debugging
  • Database Auto-creation: Read operations (search, list, get, ask, research) no longer auto-create empty databases. Write operations (add, add-src, delete, rebuild) still create the database as needed. This prevents the confusing scenario where a search query creates an empty database. Fixes issue #137.

Removed

  • BREAKING: Removed disable_autocreate config option - the behavior is now automatic based on operation type
  • BREAKING: Removed legacy ResearchStream and ResearchStreamEvent classes (replaced by AG-UI event protocol)

0.15.0 - 2025-11-07

Added

  • File Monitor: Orphan deletion feature - automatically removes documents from database when source files are deleted (enabled via monitor.delete_orphans config option, default: false)

Changed

  • Configuration: All CLI commands now properly support --config parameter for specifying custom configuration files
  • Configuration loading consolidated across CLI, app, and client with consistent resolution order
  • HaikuRAGApp and MCP server now accept config parameter for programmatic configuration
  • Updated CLI documentation to clarify global vs per-command options
  • BREAKING: Standardized configuration filename to haiku.rag.yaml in user directories (was incorrectly using config.yaml). Users with existing config.yaml in their user directory will need to rename it to haiku.rag.yaml

Fixed

  • File Monitor: Fixed incorrect "Updated document" logging for unchanged files - monitor now properly skips files when MD5 hash hasn't changed

Removed

  • BREAKING: A2A (Agent-to-Agent) protocol support has been moved to a separate self-contained package in examples/a2a-server/. The A2A server is no longer part of the main haiku.rag package. Users who need A2A functionality can install and run it from the examples directory with cd examples/a2a-server && uv sync.
  • BREAKING: Removed deprecated .env-based configuration system. The haiku-rag init-config --from-env command and load_config_from_env() function have been removed. All configuration must now be done via YAML files. Environment variables for API keys (e.g., OPENAI_API_KEY, ANTHROPIC_API_KEY) and service URLs (e.g., OLLAMA_BASE_URL) are still supported and can be set via .env files.

0.14.1 - 2025-11-06

Added

  • Migrated research and deep QA agents to use Pydantic Graph beta API for better graph execution
  • Automatic semaphore-based concurrency control for parallel sub-question processing
  • max_concurrency parameter for controlling parallel execution in research and deep QA (default: 1)

Changed

  • BREAKING: Research and Deep QA graphs now use pydantic_graph.beta instead of the class-based graph implementation
  • Refactored graph common patterns into graph_common module
  • Sub-questions now process using .map() for true parallel execution
  • Improved graph structure with cleaner node definitions and flow control
  • Pinned critical dependencies: docling-core, lancedb, docling

0.14.0 - 2024-11-05

Added

  • New haiku.rag-slim package with minimal dependencies for users who want to install only what they need
  • Evaluations package (haiku.rag-evals) for internal benchmarking and testing
  • Improved search filtering performance by using pandas DataFrames for joins instead of SQL WHERE IN clauses

Changed

  • BREAKING: Restructured project into UV workspace with three packages:
  • haiku.rag-slim - Core package with minimal dependencies
  • haiku.rag - Full package with all extras (recommended for most users)
  • haiku.rag-evals - Internal benchmarking and evaluation tools
  • Migrated from pydantic-ai to pydantic-ai-slim with extras system
  • Docling is now an optional dependency (install with haiku.rag-slim[docling])
  • Package metadata checks now use haiku.rag-slim (always present) instead of haiku.rag
  • Docker image optimized: removed evaluations package, reducing installed packages from 307 to 259
  • Improved vector search performance through optimized score normalization

Fixed

  • ImportError now properly raised when optional docling dependency is missing

0.13.3 - 2024-11-04

Added

  • Support for Zero Entropy reranker
  • Filter parameter to search() for filtering documents before search
  • Filter parameter to CLI search command
  • Filter parameter to CLI list command for filtering document listings
  • Config option to pass custom configuration files to evaluation commands
  • Document filtering now respects configured include/exclude patterns when using add-src with directories
  • Max retries to insight_agent when producing structured output

Fixed

  • CLI now loads .env files at startup
  • Info command no longer attempts to use deprecated .env settings
  • Documentation typos

0.13.2 - 2024-11-04

Added

  • Gitignore-style pattern filtering for file monitoring using pathspec
  • Include/exclude pattern documentation for FileMonitor

Changed

  • Moved monitor configuration to its own section in config
  • Improved configuration documentation
  • Updated dependencies

0.13.1 - 2024-11-03

Added

  • Initial version tracking