Changelog
Unreleased
Removed
- A2A Example: Removed
examples/a2a-server/A2A protocol server example - Stale Example References: Cleaned up references to removed
ag-ui-researchexample from documentation
Changed
- Type Checker: Replaced pyright with ty, Astral's extremely fast Python type checker
- Added explicit
Agent[Deps, Output]type annotations to all pydantic-ai agents for better type inference - Removed ~24 unnecessary
# type: ignorecomments that ty correctly infers - Dependencies: Updated to latest versions
pydantic-ai-slim: 1.39.0 → 1.44.0docling: 2.67.0 → 2.68.0pathspec: 0.12.1 → 1.0.3textual: 7.0.0 → 7.3.0datasets: 4.4.2 → 4.5.0ruff: 0.14.11 → 0.14.13opencv-python-headless: 4.12.0.88 → 4.13.0.90
[0.26.6] - 2026-01-19
Changed
- Explicit Database Migrations: Database migrations are no longer applied automatically on open
- Opening a database with pending migrations now raises
MigrationRequiredErrorwith a clear message - New
haiku-rag migratecommand to explicitly apply pending migrations - Version-only updates (no schema changes) are applied silently in writable mode
- New
skip_migration_checkparameter onStorefor tools that need to bypass the check Store.migrate()method returns list of applied migration descriptions
[0.26.5] - 2026-01-16
Added
- Background Context Support: Pass background context to agents via CLI or Python API
haiku-rag ask --context "..." --context-file pathfor Q&A with background contexthaiku-rag research --context "..." --context-file pathfor research with background contexthaiku-rag chat --context "..." --context-file pathfor chat sessions with persistent contextResearchContext(background_context="...")for Python API usageChatSessionState(background_context="...")for chat agent sessions- Context is included in agent system prompts and research graph planning
- Frontend Background Context: Settings panel in the chat app to configure persistent background context
- Context is stored in localStorage and sent with each conversation
- Frontend Linting: Added Biome for linting and formatting the frontend codebase
[0.26.4] - 2026-01-15
Added
- AGUI_STATE_KEY Constant: Exported
AGUI_STATE_KEY("haiku.rag.chat") fromhaiku.rag.agents.chatfor namespaced AG-UI state emission - Enables integrators to use a consistent key when combining haiku.rag with other agents
- Backend, TUI, and frontend now use this key for state emission and extraction
[0.26.3] - 2026-01-15
Added
- Enhanced Database Info:
haiku-rag infonow displayspydantic-aiversion anddocling-document schemaversion - Keyed State Emission for Chat Agent: New
state_keyparameter inChatDepsfor namespaced AG-UI state snapshots - When set, tools emit
{state_key: snapshot}instead of bare state, enabling state merging when multiple agents share state - Default
Nonepreserves backwards compatibility (bare state emission) - Page Image Generation Control: New
generate_page_imagesoption inConversionOptionsto control PDF page image extraction
Changed
- CLI Error Handling: Commands (
rebuild,vacuum,create-index,ask,research) now propagate errors with proper exit codes instead of swallowing exceptions
Fixed
- Embed-only rebuild with changed vector dimensions: Fixed
haiku-rag rebuild --embed-onlyfailing when the configured embedding model has different dimensions than the database - Store now reads stored vector dimension when opening existing databases, allowing chunks to be read regardless of current config
_rebuild_embed_onlyrecreates the chunks table to handle dimension changesgenerate_page_images: bool = True- Enable/disable rendered page images (used byvisualize_chunk())- Works with both
docling-localanddocling-serveconverters - For
docling-serve, maps toimage_export_modeAPI parameter (embedded/placeholder) - Note:
generate_picture_images(embedded figures/diagrams) works with local converter but has limited support in docling-serve
[0.26.2] - 2026-01-13
Changed
- Dependencies: Updated docling dependencies for latest docling-serve compatibility (#229)
docling-core: 2.57.0 → 2.59.0 (supports schema 1.9.0)docling: 2.65.0 → 2.67.0
[0.26.1] - 2026-01-13
Fixed
- Docling Schema Version Mismatch: Fixed incompatibility between
doclinganddocling-corecausingValidationError: Doc version 1.9.0 incompatible with SDK schema version 1.8.0when adding documents (#229) - Root cause:
docling-corewas reverted to 2.57.0 (schema 1.8.0) for docling-serve compatibility, butdoclingremained at 2.67.0 (schema 1.9.0) - Fix: Reverted
doclingfrom 2.67.0 to 2.65.0 to matchdocling-coreschema version
[0.26.0] - 2026-01-13
Added
- Conversational RAG Application: Full-stack application (
app/) with CopilotKit frontend and pydantic-ai AG-UI backend - Next.js frontend with chat interface, citation display, and visual grounding
- Starlette backend using pydantic-ai's native
AGUIAdapterfor streaming - Docker Compose setup for development (
docker-compose.dev.yml) and production - Logfire integration for debugging LLM calls
- SSE heartbeat to prevent connection timeouts
- Chat Agent (
haiku.rag.agents.chat): New conversational RAG agent optimized for multi-turn chat create_chat_agent()factory function for creating chat agents with AG-UI supportSearchAgentfor internal query expansion with deduplicationChatDepsandChatSessionStatefor session managementCitationInfoandQAResponsemodels for structured responses- Natural language document filtering via
build_document_filter() - Configurable search limit per agent
- Chat TUI (
haiku-rag chat): Terminal-based chat interface using Textual - Single chat window with inline tool calls and expandable citations
- Visual grounding (
vkey) reuses inspector'sVisualGroundingModal - Database info (
ikey) shows document/chunk counts and storage info - Keybindings:
qquit,Ctrl+Lclear chat,Escapefocus input - Q/A History Management: Intelligent conversation history with semantic ranking
- FIFO queue with 50 max entries
- Embedding cache to avoid re-embedding Q/A pairs
rank_qa_history_by_similarity()returns top-K most relevant history entries- Confidence filtering to exclude low-confidence answers from context
- Conversational Research Graph: Simplified single-iteration research graph for chat
build_conversational_graph()optimized for conversational Q&A- Context-aware planning (generates fewer sub-questions when history exists)
ConversationalAnsweroutput type with direct answer and citations
Changed
- BREAKING: Module Reorganization: Consolidated all agent code under
haiku.rag.agents - Moved
haiku.rag.qa→haiku.rag.agents.qa - Moved
haiku.rag.graph.research→haiku.rag.agents.research - Added
haiku.rag.agents.chatmodule with conversational RAG agent - Deleted
haiku.rag.graphmodule (research graph now athaiku.rag.agents.research.graph)
Removed
- BREAKING: Custom AG-UI Infrastructure: Removed custom AG-UI event handling in favor of pydantic-ai's native AG-UI support
- Deleted
haiku.rag.graph.aguimodule (AGUIEmitter,AGUIConsoleRenderer,stream_graph(),create_agui_server()) - Removed
--aguiflag fromservecommand - Removed
--verboseflags fromaskandresearchcommands - Removed
--interactiveflag fromresearchcommand - Removed
AGUIConfigfrom configuration - Deleted
cli_chat.pyinteractive chat module - Research graph now uses
graph.run()directly instead ofstream_graph() - For AG-UI streaming, use pydantic-ai's native
AGUIAdapterwithToolReturnandStateSnapshotEvent(seeapp/backend/for example) - AG-UI Research Example: Removed
examples/ag-ui-research/(replaced byapp/)
[0.25.0] - 2026-01-12
Fixed
- Large Document Storage Overflow: Fixed "byte array offset overflow" panic when vacuuming/rebuilding databases with many large PDF documents (#225)
- Root cause: Arrow's 32-bit string column offsets limited to ~2GB per fragment
- Changed
docling_document_json(string) todocling_document(bytes) withlarge_binaryArrow type (64-bit offsets) - Added gzip compression for DoclingDocument JSON (~1.4x compression ratio)
- Migration automatically compresses existing documents in batches to avoid memory issues
- Breaking: Migration is destructive - all table version history is lost after upgrade
Changed
- Dependencies: Updated lancedb 0.26.0 → 0.26.1, docling 2.65.0 → 2.67.0
Removed
- Legacy Migrations: Removed obsolete database migration files (
v0_9_3.py,v0_10_1.py,v0_19_6.py). These migrations were for versions prior to 0.20.0 and are no longer needed since the current release requires a database rebuild anyway.
[0.24.2] - 2026-01-08
Fixed
- Base64 Images in Expanded Context: Fixed base64 image data leaking into expanded search results when
expand_context()processedPictureItemobjects. The issue wasPictureItem.export_to_markdown()defaulting toEMBEDDEDmode. Now explicitly usesPLACEHOLDERmode to prevent base64 data while still including VLM descriptions and captions.
[0.24.1] - 2026-01-08
Fixed
- OpenAI Non-Reasoning Models: Fixed
reasoning_effortparameter being sent to non-reasoning OpenAI models (gpt-4o, gpt-4o-mini), causing 400 errors. Now correctly detects reasoning models (o1, o3 series) using pydantic-ai's model profile. - Bedrock Non-Reasoning Models: Fixed same issue for OpenAI models on Bedrock.
[0.24.0] - 2026-01-07
Added
- VLM Picture Description: Describe embedded images using Vision Language Models during document conversion
- Images are sent to a VLM for automatic description via OpenAI-compatible API
- Descriptions become searchable text, improving RAG retrieval for visual content
- Configure via
processing.conversion_options.picture_descriptionwithenabled,model,timeout,max_tokens - Default prompt customizable via
prompts.picture_description - Requires OpenAI-compatible
/v1/chat/completionsendpoint (Ollama, OpenAI, vLLM, LM Studio)
[0.23.2] - 2026-01-05
Fixed
- AG-UI Concurrent Step Tracking: Emitter now correctly tracks multiple concurrent steps (#216)
Changed
- Dependencies: Updated core and development dependencies
[0.23.1] - 2025-12-29
Added
- Contextualized FTS Search: Full-text search now includes section headings
- New
content_ftscolumn stores contextualized content (headings + body text) - FTS index now searches
content_ftsfor better keyword matching on section context - Original
contentcolumn preserved for display and context expansion - Migration automatically populates
content_ftsfor existing databases - GitHub Actions CI: Test workflow runs pytest, pyright, and ruff on push/PR to main
- VCR Cassette Recording: Integration tests use recorded HTTP responses for deterministic CI runs
- LLM tests (QA, embeddings, research graph) replay from cassettes without real API calls
- docling-serve tests run without Docker container in CI
- Uses pytest-recording with custom JSON body serializer
[0.23.0] - 2025-12-26
Added
- Prompt Customization: Configure agent prompts via
promptsconfig section domain_preamble: Prepended to all agent prompts for domain contextqa: Full replacement for QA agent promptsynthesis: Full replacement for research synthesis prompt
Changed
- Embeddings: Migrated to pydantic-ai's embeddings module
- Uses pydantic-ai v1.39.0+ embeddings with instrumentation and token counting support
- Explicit
embed_query()andembed_documents()API for query/document distinction - New providers available: Cohere (
cohere:), SentenceTransformers (sentence-transformers:) - VoyageAI refactored to extend pydantic-ai's
EmbeddingModelbase class - Configuration: Added
base_urltoModelConfigandEmbeddingModelConfig - Enables custom endpoints for OpenAI-compatible providers (vLLM, LM Studio, etc.)
- Model-level
base_urltakes precedence over provider config
Deprecated
- vLLM and LM Studio providers: Use
openaiprovider withbase_urlinstead provider: vllm→provider: openaiwithbase_url: http://localhost:8000/v1provider: lm_studio→provider: openaiwithbase_url: http://localhost:1234/v1
Removed
- Deleted obsolete embedder implementations:
ollama.py,openai.py,vllm.py,lm_studio.py,base.py - Removed
VLLMConfigandLMStudioConfigfrom configuration (usebase_urlin model config instead)
[0.22.0] - 2025-12-19
Added
- Read-Only Mode: Global
--read-onlyCLI flag for safe database access without modifications - Blocks all write operations at the Store layer
- Skips database upgrades and settings saves on open
- Excludes write tools (
add_document_*,delete_document) from MCP server - Disables file monitor with warning when
--read-onlyis used withserve --monitor - Time Travel: Query the database as it existed at a previous point in time
- Global
--beforeCLI flag accepts datetime strings (ISO 8601 or date-only) - Automatically enables read-only mode when time-traveling
- New
historycommand shows version history for database tables - Useful for debugging and auditing
- Supported throughout: CLI, Client, App, Inspector
Fixed
- File Monitor Path Validation: Monitor now validates directories exist before watching (#204)
- Provides clear error message pointing to
haiku.rag.yamlconfiguration - Prevents cryptic
FileNotFoundError: No path was foundfrom watchfiles - Docker Documentation: Improved Docker setup instructions
- Added volume mount examples for config file and documents directory
- Clarified that
monitor.directoriesmust use container paths, not host paths
Changed
- Dependencies: Updated core dependencies
pydantic-ai-slim: 1.27.0 → 1.36.0 (FileSearchTool, web chat UI, GPT-5.2 support, prompt caching)lancedb: 0.25.3 → 0.26.0docling: 2.64.0 → 2.65.0docling-core: 2.54.0 → 2.57.0
[0.21.0] - 2025-12-18
Added
- Interactive Research Mode: Human-in-the-loop research using graph-based decision nodes
haiku-rag research --interactivestarts conversational CLI chat- Natural language interpretation for user commands (search, modify questions, synthesize)
- Chat with assistant before starting research, and during decision points
- Review collected answers and pending questions at each decision point
- Add, remove, or modify sub-questions through natural conversation
- New
human_decidegraph node emits AG-UI tool calls (TOOL_CALL_START/ARGS/END) for frontend integration - New
emit_tool_call_start(),emit_tool_call_args(),emit_tool_call_end()AG-UI event helpers - New
AGUIEmitter.emit()method for direct event emission - AG-UI Research Example: Human-in-the-loop research with client-side tool calling
- Frontend handles
human_decisiontool calls via AG-UITOOL_CALL_*events - Tool results sent directly to backend
/v1/research/streamendpoint - Backend queues decisions and continues the research graph
- HotpotQA Evaluation: Added HotpotQA dataset adapter for multi-hop QA benchmarks
- Extracts unique documents from validation set context paragraphs
- Uses MAP for retrieval evaluation (multiple supporting documents per question)
- Run with
evaluations hotpotqa - Plain Text Format: Added
format="plain"for text conversion - Use when content is plain text without markdown/HTML structure
- Falls back gracefully when docling cannot detect markdown format in content
- Supported in
create_document(),convert(), and all converter classes
Changed
- AG-UI Events: Replaced custom event classes with
ag_ui.coretypes - Removed
haiku.rag.graph.agui.eventsmodule - Event factory functions (
emit_*) now wrap officialag_ui.coreevent classes - Chunker Sets Order: Chunkers now set
chunk.orderdirectly - Unified Research Graph: Simplified and unified research and deep QA into a single configurable graph
- Removed
analyze_insightsnode - graph now flows directly fromcollect_answerstodecide - Simplified
EvaluationResultto:is_sufficient,confidence_score,reasoning,new_questions - Simplified
ResearchContext- removed insight/gap tracking methods ask --deepnow uses research graph withmax_iterations=2,confidence_threshold=0.0ask --deepoutput now shows executive summary, key findings, and sources- Added
include_planparameter tobuild_research_graph()for plan-less execution - Added
max_iterationsandconfidence_thresholdoverrides toResearchState.from_config() - Improved Synthesis Prompt: Updated synthesis agent prompt to produce direct answers
- Executive summary now directly answers the question instead of describing the report
- Added explicit examples of good vs bad output style
- Evaluations Vacuum Strategy:
populate_dbnow uses periodic vacuum to prevent disk exhaustion with large datasets - Disables auto_vacuum during population, vacuums every N documents with retention=0
- New
--vacuum-intervalCLI option (default: 100) to control vacuum frequency - Prevents disk space issues when building databases with thousands of documents (e.g., HotpotQA)
- Benchmarks Documentation: Restructured benchmarks.md for clarity
- Added dedicated Methodology section explaining MRR, MAP, and QA Accuracy metrics
- Organized results by dataset with retrieval and QA subsections
Removed
- Deep QA Graph: Removed
haiku.rag.graph.deep_qamodule entirely - Use
build_research_graph()with appropriate parameters instead ask --deepCLI command now uses research graph internally- Insight/Gap Tracking: Removed over-engineered insight and gap tracking from research graph
- Removed
InsightRecord,GapRecord,InsightAnalysis,InsightStatus,GapSeveritymodels - Removed
format_analysis_for_prompt()helper - Removed
INSIGHT_AGENT_PROMPTfrom prompts
[0.20.2] - 2025-12-12
Fixed
- LLM Schema Compliance: Improved prompts to prevent LLMs from returning objects instead of plain strings for
list[str]fields - All graph prompts now explicitly state that list fields must contain plain strings only
- Added missing
queryandconfidencefields to search agent output format documentation - Fixes validation errors with less capable models that ignore JSON schema constraints
- AG-UI Frontend Types: Fixed TypeScript interfaces in ag-ui-research example to match backend Python models
EvaluationResult:confidence→confidence_score,should_continue→is_sufficient,gaps_identified→gaps,follow_up_questions→new_questions, addedkey_insightsResearchReport:question→title,summary→executive_summary,findings→main_findings, removedinsights_used/methodology, addedlimitations/recommendations/sources_summary- Updated Final Report UI to display new fields (Recommendations, Limitations, Sources)
- Citation Formatting: Citations in CLI now render properly with Rich panels
- Content is rendered as markdown with proper code block formatting
- No longer truncates or flattens newlines in citation content
[0.20.1] - 2025-12-11
Added
- Search Filter for Graphs: Research and Deep QA graphs now support
search_filterparameter to restrict searches to specific documents - Set
state.search_filterto a SQL WHERE clause (e.g.,"id IN ('doc1', 'doc2')") before running the graph - Enables document-scoped research workflows
- CLI:
haiku-rag research "question" --filter "uri LIKE '%paper%'" - CLI:
haiku-rag ask "question" --filter "title = 'My Doc'" - Python:
client.ask(question, filter="...")andagent.answer(question, filter="...") - AG-UI Research Example: Added bidirectional state demonstration with document filter
- New
/api/documentsendpoint to list available documents - Frontend document selector component with search and multi-select
- Demonstrates client-to-server state flow via AG-UI protocol
- Inspector Info Modal: New
ikeyboard shortcut opens a modal displaying database information
Changed
- Inspector Lazy Loading: Chunks panel now loads chunks in batches of 50 with infinite scroll
- Fixes unresponsive UI when viewing documents with large numbers of chunks
- New
ChunkRepository.get_by_document_id()pagination withlimitandoffsetparameters - New
ChunkRepository.count_by_document_id()method
[0.20.0] - 2025-12-10
Added
- DoclingDocument Storage: Full DoclingDocument JSON is now stored with each document, enabling rich context and visual grounding
- Documents store the complete DoclingDocument structure (JSON) and schema version
- Chunks store metadata with JSON pointer references (
doc_item_refs), semantic labels, section headings, and page numbers - New
ChunkMetadatamodel for structured chunk provenance:doc_item_refs,headings,labels,page_numbers Document.get_docling_document()method to parse stored DoclingDocumentChunkMetadata.resolve_doc_items()to resolve JSON pointer refs to actual DocItem objectsChunkMetadata.resolve_bounding_boxes()for visual grounding with page coordinates- LRU cache (100 documents) for parsed DoclingDocument objects to avoid repeated JSON parsing
- Enhanced Search Results:
search()andexpand_context()now return full provenance information SearchResultincludespage_numbers,headings,labels, anddoc_item_refs- QA and research agents use provenance for better citations (page numbers, section headings)
- Type-Aware Context Expansion:
expand_context()now uses document structure for intelligent expansion - Structural content (tables, code blocks, lists) expands to complete structures regardless of chunking
- Text content uses radius-based expansion via
text_context_radiussetting max_context_itemsandmax_context_charssettings control expansion limitsSearchResult.format_for_agent()method formats expanded results with metadata for LLM consumption- Visual Grounding: View page images with highlighted bounding boxes for chunks
- Inspector modal with keyboard navigation between pages
- CLI command:
haiku-rag visualize <chunk_id> - Requires
textual-imagedependency and terminal with image support - Processing Primitives: New methods for custom document processing pipelines
convert()- Convert files, URLs, or text to DoclingDocumentchunk()- Chunk a DoclingDocument into Chunk objectscontextualize()- Prepend section headings to chunk content for embeddingembed_chunks()- Generate embeddings for chunks- New
import_document()Method: Import pre-processed documents with custom chunks - Accepts
DoclingDocumentdirectly for rich metadata (visual grounding, page numbers) - Use when document conversion, chunking, or embedding were done externally
- Chunks without embeddings are automatically embedded
- Automatic Chunk Embedding:
import_document()andupdate_document()automatically embed chunks that don't have embeddings - Pass chunks with or without embeddings - missing embeddings are generated
- Chunks with pre-computed embeddings are stored as-is
- Format Parameter for Text Conversion: New
formatparameter forconvert()andcreate_document()to specify content type - Supports
"md"(default) for markdown and"html"for HTML content - HTML format preserves document structure (headings, lists, sections) in DoclingDocument
- Enables proper parsing of HTML content that was previously treated as plain text
- Inspector Context Modal: Press
cin the inspector to view expanded context for the selected chunk - Auto-Vacuum Configuration: New
storage.auto_vacuumsetting to control automatic vacuuming behavior - When
true(default), vacuum runs automatically after document create/update operations and rebuilds - When
false, vacuum only runs via explicithaiku-rag vacuumcommand - Disabling can help avoid potential crashes in high-concurrency scenarios due to LanceDB race conditions
Changed
- BREAKING:
create_document()API: Removedchunksparameter create_document()now always processes content (converts, chunks, embeds)- Use
import_document()for pre-processed documents with custom chunks - BREAKING:
update_document()API: Unified withupdate_document_fields() - Old:
update_document(document)- pass modified Document object - New:
update_document(document_id, content=, metadata=, chunks=, title=, docling_document=) contentanddocling_documentare mutually exclusive- BREAKING: Chunker Interface:
DocumentChunker.chunk()now returnslist[Chunk]instead oflist[str] - Chunks include structured metadata (doc_item_refs, labels, headings, page_numbers)
- Search Config: New settings in
searchsection for search behavior and context expansion search.limit- Default number of search results (default: 5). Used by CLI, MCP server, and API when no limit specifiedsearch.context_radius- DocItems before/after to include for text content expansion (default: 0)search.max_context_items- Maximum items in expanded context (default: 10)search.max_context_chars- Maximum characters in expanded context (default: 10000)- Rebuild Performance: Batched database writes during
rebuildcommand reduce LanceDB versions by ~98% - All rebuild modes (FULL, RECHUNK, EMBED_ONLY) now batch writes across documents
- Eliminates redundant per-document chunk deletions and vacuum calls
- Significantly reduces storage overhead and improves rebuild speed for large databases
- Embedding Architecture: Moved embedding generation from
ChunkRepositoryto client layer - Repository is now a pure persistence layer
- Client handles embedding via
_ensure_chunks_embedded() - Chunk Text Storage: Chunks store raw text; headings prepended only at embedding time
- Stored chunk content stays clean without duplicate heading prefixes
- Local and serve chunkers now produce identical output
- Citation Models: Introduced
RawSearchAnswerfor LLM output,SearchAnswerwith resolved citations - Page Image Generation: Always enabled for local docling converter (required for visual grounding)
- Download Models Progress:
haiku-rag download-modelsnow shows real-time progress with Rich progress bars for Ollama model downloads
Removed
- BREAKING:
markdown_preprocessorConfig Option: Use processing primitives (convert(),chunk(),embed_chunks()) for custom pipelines update_document_fields(): Merged intoupdate_document()
Migration
This release requires a database rebuild to populate the new DoclingDocument fields:
Existing documents without DoclingDocument data will work but won't have provenance information.
[0.19.6] - 2025-12-03
Changed
- BREAKING: Explicit Database Creation: Databases must now be explicitly created before use
- New
haiku-rag initcommand creates a new empty database - Python API:
HaikuRAG(path, create=True)to create database programmatically - Operations on non-existent databases raise
FileNotFoundError - BREAKING: Embeddings Configuration: Restructured to nested
EmbeddingModelConfig - Config path changed from
embeddings.{provider, model, vector_dim}toembeddings.model.{provider, name, vector_dim} - Automatic migration upgrades existing databases to new format
- Database Migrations: Always run when opening an existing database
[0.19.5] - 2025-12-01
Changed
- Rebuild Performance: Optimized
rebuild --embed-onlyto use batch updates via LanceDB'smerge_insertinstead of individual chunk updates, and skip chunks with unchanged embeddings
[0.19.4] - 2025-11-28
Added
- Rebuild Modes: New options for
rebuildcommand to control what gets rebuilt --embed-only: Only regenerate embeddings, keeping existing chunks (fastest option when changing embedding model)--rechunk: Re-chunk from existing document content without accessing source files- Default (no flag): Full rebuild with source file re-conversion
- Python API:
rebuild_database(mode=RebuildMode.EMBED_ONLY | RECHUNK | FULL)
[0.19.3] - 2025-11-27
Changed
- Async Chunker:
DoclingServeChunkernow useshttpx.AsyncClientinstead of syncrequests
Fixed
- OCR Options: Fixed
DoclingLocalConverterusing baseOcrOptionsclass which docling's OCR factory doesn't recognize. Now usesOcrAutoOptionsfor automatic OCR engine selection. - Dependencies: Added
opencv-python-headlessto thedoclingoptional dependency for table structure detection.
[0.19.2] - 2025-11-27
Changed
- Async Converters: Made document converters fully async
BaseConverter.convert_file()andconvert_text()are now async methodsDoclingLocalConverterwraps blocking Docling operations withasyncio.to_thread()DoclingServeConverternow useshttpx.AsyncClientinstead of syncrequests- Async Model Prefetch:
prefetch_models()is now async - Uses
httpx.AsyncClientfor Ollama model pulls - Wraps blocking Docling and HuggingFace downloads with
asyncio.to_thread()
[0.19.1] - 2025-11-26
Added
- LM Studio Provider: Added support for LM Studio as a provider for embeddings and QA/research models
- Configure with
provider: lm_studioin embeddings, QA, or research model settings - Supports thinking control for reasoning models (gpt-oss, etc.)
- Default base URL:
http://localhost:1234
Fixed
- Configuration: Fixed
init-configcommand generating invalid configuration files (#165) - Refactored
generate_default_config()to use Pydantic model serialization instead of manual dict construction - Updated
qa,research, andrerankingsections to use newModelConfigstructure
[0.19.0] - 2025-11-25
Added
- Model Customization: Added support for per-model configuration settings
- New
enable_thinkingparameter to control reasoning behavior (true/false/None) - Support for
temperatureandmax_tokenssettings on QA and research models - All settings apply to any provider that supports them
- Database Inspector: New
inspectCLI command launches interactive TUI for browsing documents and chunks & searching - Evaluations: Added
evaluationsCLI script for running benchmarks (replacespython -m evaluations.benchmark) - Evaluations: Added
--dboption to override evaluation database path - Default database location moved to haiku.rag data directory:
- macOS:
~/Library/Application Support/haiku.rag/evaluations/dbs/ - Linux:
~/.local/share/haiku.rag/evaluations/dbs/ - Windows:
C:/Users/<USER>/AppData/Roaming/haiku.rag/evaluations/dbs/
- macOS:
- Previously stored in
evaluations/data/within the repository - Evaluations: Added comprehensive experiment metadata tracking for better reproducibility
- Records dataset name, test case count, and all model configurations
- Tracks embedder settings: provider, model, and vector dimensions
- Tracks QA model: provider and model name
- Tracks judge model: provider and model name for LLM evaluation
- Tracks processing parameters:
chunk_sizeandcontext_chunk_radius - Tracks retrieval configuration:
retrieval_limitfor number of chunks retrieved - Tracks reranking configuration:
rerank_providerandrerank_model - Enables comparison of evaluation runs with different configurations in Logfire
- Evaluations: Refactored retrieval evaluation to use pydantic-ai experiment framework
- New
evaluatorsmodule withMRREvaluator(Mean Reciprocal Rank) andMAPEvaluator(Mean Average Precision) - Retrieval benchmarks now use
Dataset.evaluate()with full Logfire experiment tracking - Dataset specifications now declare their retrieval evaluator (MRR for RepliQA, MAP for Wix)
- Replaced Recall@K and Success@K with industry-standard MRR and MAP metrics
- Unified evaluation framework for both retrieval and QA benchmarks
- AG-UI Events: Enhanced ActivitySnapshot events with richer structured data
- Added
stepNamefield to identify which graph node emitted each activity - Added structured fields to activity content while preserving backward-compatible
messagefield:- Planning:
sub_questions- list of sub-question strings - Searching:
query- the search query,confidence- answer confidence (on success),error- error message (on failure) - Analyzing (research):
insights- list of insight objects,gaps- list of gap objects,resolved_gaps- list of resolved gap strings - Evaluating (research):
confidence- confidence score,is_sufficient- sufficiency flag - Evaluating (deep QA):
is_sufficient- sufficiency flag,iterations- iteration count
- Planning:
Changed
- Evaluations: Renamed
--qa-limitCLI parameter to--limit, now applies to both retrieval and QA benchmarks - Evaluations: Retrieval evaluator selection moved from runtime logic to dataset configuration
[0.18.0] - 2025-11-21
Added
- Manual Vector Indexing: New
create-indexCLI command for explicit vector index creation - Creates IVF_PQ indexes
- Requires minimum 256 chunks (LanceDB training data requirement)
- New
search.vector_index_metricconfig option:cosine(default),l2, ordot - New
search.vector_refine_factorconfig option (default: 30) for accuracy/speed tradeoff - Indexes not created automatically during ingestion to avoid performance degradation
- Manual rebuilding required after adding significant new data
- Enhanced Info Command:
haiku-rag infonow shows storage sizes and vector index statistics - Displays storage size for documents and chunks tables in human-readable format
- Shows vector index status (exists/not created)
- Shows indexed and unindexed chunk counts for monitoring index staleness
Changed
- BREAKING: Default Embedding Model: Changed default embedding model from
qwen3-embeddingtoqwen3-embedding:4bwith vector dimension 2560 (previously 4096) - New installations will use the smaller, more efficient 4B parameter model by default
- Action required: Existing databases created with the old default will be incompatible. Users must either:
- Explicitly set
embeddings.model: "qwen3-embedding"andembeddings.vector_dim: 4096in their config to maintain compatibility with existing databases - Or run
haiku-rag rebuildto re-embed all documents with the new default
- Explicitly set
- This change provides better performance for most use cases while reducing resource requirements
- Evaluations: Improved evaluation dataset naming and simplified evaluator configuration
EvalDatasetnow accepts dataset name for better organization in Logfire- Added
--nameCLI parameter to override evaluation run names - Removed
IsInstanceevaluator, using onlyLLMJudgefor QA evaluation - Search Accuracy: Applied
refine_factorto vector and hybrid searches for improved accuracy - Retrieves
refine_factor * limitcandidates and re-ranks in memory - Higher values increase accuracy but slow down queries
Fixed
- AG-UI Activity Events: Activity events now correctly use structured dict content instead of strings
- Graph Configuration: Graph builder functions now properly accept and use non-global config (#149)
build_research_graph()andbuild_deep_qa_graph()now pass config to all agents and model creationget_model()utility function acceptsconfigparameter (defaults to global Config)- Allows creating multiple graphs with different configurations in the same application
[0.17.2] - 2025-11-19
Added
- Document Update API: New
update_document_fields()method for partial document updates - Update individual fields (content, metadata, title, chunks) without fetching full document
- Support for custom chunks or auto-generation from content
Changed
- Chunk Creation:
ChunkRepository.create()now accepts both single chunks and lists for batch insertion - Batch insertion reduces LanceDB version creation when adding multiple chunks with custom chunks
- Batch embedding generation for improved performance with multiple chunks
- Updated core dependencies
[0.17.1] - 2025-11-18
Added
- Conversion Options: Fine-grained control over document conversion for both local and remote converters
- New
conversion_optionsconfig section inProcessingConfig - OCR settings:
do_ocr,force_ocr,ocr_langfor controlling OCR behavior - Table extraction:
do_table_structure,table_mode(fast/accurate),table_cell_matching - Image settings:
images_scaleto control image resolution - Options work identically with both
docling-localanddocling-serveconverters
Changed
- Increase reranking candidate retrieval multiplier from 3x to 10x for improved result quality
- Docker Images: Main
haiku.ragimage no longer automatically built and published - Conversion Options: Removed the legacy
pdf_backendsetting; docling now chooses the optimal backend automatically
[0.17.0] - 2025-11-17
Added
- Remote Processing: Support for docling-serve as remote document processing and chunking service
- New
converterconfig option:docling-local(default) ordocling-serve - New
chunkerconfig option:docling-local(default) ordocling-serve - New
providers.docling_serveconfig section withbase_url,api_key, andtimeout - Comprehensive error handling for connection, timeout, and authentication issues
- Chunking Strategies: Support for both hybrid and hierarchical chunking
- New
chunker_typeconfig option:hybrid(default) orhierarchical - Hybrid chunking: Structure-aware splitting that respects document boundaries
- Hierarchical chunking: Preserves document hierarchy for nested documents
- Table Serialization Control: Configurable table representation in chunks
- New
chunking_use_markdown_tablesconfig option (default:false) false: Tables serialized as narrative text ("Value A, Column 2 = Value B")true: Tables preserved as markdown format with structure- Chunking Configuration: Additional chunking control options
- New
chunking_merge_peersconfig option (default:true) to merge undersized successive chunks - Docker Images: Two Docker images for different deployment scenarios
haiku.rag: Full image with all dependencies for self-contained deploymentshaiku.rag-slim: Minimal image designed for use with external docling-serve- Multi-platform support (linux/amd64, linux/arm64)
- Docker Compose examples with docling-serve integration
- Automated CI/CD workflows for both images
- Build script (
scripts/build-docker-images.sh) for local multi-platform builds
Changed
- BREAKING: Chunking Tokenizer: Switched from tiktoken to HuggingFace tokenizers for consistency with docling-serve
- Default tokenizer changed from tiktoken "gpt-4o" to "Qwen/Qwen3-Embedding-0.6B"
- New
chunking_tokenizerconfig option inProcessingConfigfor customization download-modelsCLI command now also downloads the configured HuggingFace tokenizer- Docker Examples: Updated examples to demonstrate remote processing
examples/dockernow uses slim image with docling-serveexamples/ag-ui-researchbackend uses slim image with docling-serve- Configuration examples include remote processing setup
[0.16.1] - 2025-11-14
Changed
- Evaluations: Refactored QA benchmark to run entire dataset as single evaluation for better Logfire experiment tracking
- Evaluations: Added
.envfile loading support viapython-dotenvdependency
[0.16.0] - 2025-11-13
Added
- AG-UI Protocol Support: Full AG-UI (Agent-UI) protocol implementation for graph execution with event streaming
- New
AGUIEmitterclass for emitting AG-UI events from graphs - Support for all AG-UI event types: lifecycle events (
RUN_STARTED,RUN_FINISHED,RUN_ERROR), step events (STEP_STARTED,STEP_FINISHED), state updates (STATE_SNAPSHOT,STATE_DELTA), activity narration (ACTIVITY_SNAPSHOT), and text messages (TEXT_MESSAGE_CHUNK) AGUIConsoleRendererfor rendering AG-UI event streams to terminal with Rich formattingstream_graph()utility function for executing graphs with AG-UI event emission- State diff computation for efficient state synchronization
- Delta State Updates: AG-UI emitter now supports incremental state updates via JSON Patch operations (
STATE_DELTAevents) to reduce bandwidth, configurable viause_deltasparameter (enabled by default) - AG-UI Server: Starlette-based HTTP server for serving graphs via AG-UI protocol
- Server-Sent Events (SSE) streaming endpoint at
/v1/agent/stream - Health check endpoint at
/health - Full CORS support configurable via
aguiconfig section create_agui_server()function for programmatic server creation- Deep QA AG-UI Support: Deep QA graph now fully supports AG-UI event streaming
- Integration with
AGUIEmitterfor progress tracking - Step-by-step execution visibility via AG-UI events
- CLI AG-UI Flag: New
--aguiflag forservecommand to start AG-UI server - Graph Module: New unified
haiku.rag.graphmodule containing all graph-related functionality - Common Graph Nodes: New factory functions (
create_plan_node,create_search_node) inhaiku.rag.graph.common.nodesfor reusable graph components - AG-UI Research Example: New full-stack example (
examples/ag-ui-research) demonstrating agent+graph architecture with CopilotKit frontend - Pydantic AI agent with research tool that invokes the research graph
- Custom AG-UI streaming endpoint with anyio memory streams
- React/Next.js frontend with split-pane UI showing live research state
- Real-time progress tracking of questions, answers, insights, and gaps
- Docker Compose setup for easy local development
Changed
- Vacuum Retention: Default
vacuum_retention_secondsincreased from 60 seconds to 86400 seconds (1 day) for better version retention in typical workflows - BREAKING: Major refactoring of graph-related code into unified
haiku.rag.graphmodule structure: haiku.rag.research→haiku.rag.graph.researchhaiku.rag.qa.deep→haiku.rag.graph.deep_qahaiku.rag.agui→haiku.rag.graph.aguihaiku.rag.graph_common→haiku.rag.graph.common- BREAKING: Research and Deep QA graphs now use AG-UI event protocol instead of direct console logging
- Removed
consoleandstreamparameters from graph dependencies - All progress updates now emit through
AGUIEmitter - BREAKING:
ResearchStateconverted from dataclass to PydanticBaseModelfor JSON serialization and AG-UI compatibility - Research and Deep QA graphs now emit detailed execution events for better observability
- CLI research command now uses AG-UI event rendering for
--verboseoutput - Improved graph execution visibility with step-by-step progress tracking
- Updated all documentation to reflect new import paths and AG-UI usage
- Updated examples (ag-ui-research, a2a-server) to use new import paths
Fixed
- Document Creation: Optimized
create_documentto skip unnecessary DoclingDocument conversion when chunks are pre-provided - FileReader: Error messages now include both original exception details and file path for easier debugging
- Database Auto-creation: Read operations (search, list, get, ask, research) no longer auto-create empty databases. Write operations (add, add-src, delete, rebuild) still create the database as needed. This prevents the confusing scenario where a search query creates an empty database. Fixes issue #137.
Removed
- BREAKING: Removed
disable_autocreateconfig option - the behavior is now automatic based on operation type - BREAKING: Removed legacy
ResearchStreamandResearchStreamEventclasses (replaced by AG-UI event protocol)
[0.15.0] - 2025-11-07
Added
- File Monitor: Orphan deletion feature - automatically removes documents from database when source files are deleted (enabled via
monitor.delete_orphansconfig option, default: false)
Changed
- Configuration: All CLI commands now properly support
--configparameter for specifying custom configuration files - Configuration loading consolidated across CLI, app, and client with consistent resolution order
HaikuRAGAppand MCP server now acceptconfigparameter for programmatic configuration- Updated CLI documentation to clarify global vs per-command options
- BREAKING: Standardized configuration filename to
haiku.rag.yamlin user directories (was incorrectly usingconfig.yaml). Users with existingconfig.yamlin their user directory will need to rename it tohaiku.rag.yaml
Fixed
- File Monitor: Fixed incorrect "Updated document" logging for unchanged files - monitor now properly skips files when MD5 hash hasn't changed
Removed
- BREAKING: A2A (Agent-to-Agent) protocol support has been moved to a separate self-contained package in
examples/a2a-server/. The A2A server is no longer part of the main haiku.rag package. Users who need A2A functionality can install and run it from the examples directory withcd examples/a2a-server && uv sync. - BREAKING: Removed deprecated
.env-based configuration system. Thehaiku-rag init-config --from-envcommand andload_config_from_env()function have been removed. All configuration must now be done via YAML files. Environment variables for API keys (e.g.,OPENAI_API_KEY,ANTHROPIC_API_KEY) and service URLs (e.g.,OLLAMA_BASE_URL) are still supported and can be set via.envfiles.
[0.14.1] - 2025-11-06
Added
- Migrated research and deep QA agents to use Pydantic Graph beta API for better graph execution
- Automatic semaphore-based concurrency control for parallel sub-question processing
max_concurrencyparameter for controlling parallel execution in research and deep QA (default: 1)
Changed
- BREAKING: Research and Deep QA graphs now use
pydantic_graph.betainstead of the class-based graph implementation - Refactored graph common patterns into
graph_commonmodule - Sub-questions now process using
.map()for true parallel execution - Improved graph structure with cleaner node definitions and flow control
- Pinned critical dependencies:
docling-core,lancedb,docling
0.14.0 - 2024-11-05
Added
- New
haiku.rag-slimpackage with minimal dependencies for users who want to install only what they need - Evaluations package (
haiku.rag-evals) for internal benchmarking and testing - Improved search filtering performance by using pandas DataFrames for joins instead of SQL WHERE IN clauses
Changed
- BREAKING: Restructured project into UV workspace with three packages:
haiku.rag-slim- Core package with minimal dependencieshaiku.rag- Full package with all extras (recommended for most users)haiku.rag-evals- Internal benchmarking and evaluation tools- Migrated from
pydantic-aitopydantic-ai-slimwith extras system - Docling is now an optional dependency (install with
haiku.rag-slim[docling]) - Package metadata checks now use
haiku.rag-slim(always present) instead ofhaiku.rag - Docker image optimized: removed evaluations package, reducing installed packages from 307 to 259
- Improved vector search performance through optimized score normalization
Fixed
- ImportError now properly raised when optional docling dependency is missing
0.13.3 - 2024-11-04
Added
- Support for Zero Entropy reranker
- Filter parameter to
search()for filtering documents before search - Filter parameter to CLI
searchcommand - Filter parameter to CLI
listcommand for filtering document listings - Config option to pass custom configuration files to evaluation commands
- Document filtering now respects configured include/exclude patterns when using
add-srcwith directories - Max retries to insight_agent when producing structured output
Fixed
- CLI now loads
.envfiles at startup - Info command no longer attempts to use deprecated
.envsettings - Documentation typos
0.13.2 - 2024-11-04
Added
- Gitignore-style pattern filtering for file monitoring using pathspec
- Include/exclude pattern documentation for FileMonitor
Changed
- Moved monitor configuration to its own section in config
- Improved configuration documentation
- Updated dependencies
0.13.1 - 2024-11-03
Added
- Initial version tracking