Agents
Three agentic flows are provided by haiku.rag:
- Simple QA Agent — a focused question answering agent
- Deep QA Agent — multi-agent question decomposition for complex questions
- Research Multi‑Agent — a multi‑step, analyzable research workflow
For an interactive example using Pydantic AI and AG-UI, see the Interactive Research Assistant example (demo video). The demo uses a knowledge base containing haiku.rag's code and documentation.
See QA and Research Configuration for configuring model, iterations, concurrency, and other settings.
Simple QA Agent
The simple QA agent answers a single question using the knowledge base. It retrieves relevant chunks, optionally expands context around them, and asks the model to answer strictly based on that context.
Key points:
- Uses a single
search_documentstool to fetch relevant chunks - Can be run with or without inline citations in the prompt (citations prefer document titles when present, otherwise URIs)
- Returns a plain string answer
Python usage:
from haiku.rag.client import HaikuRAG
from haiku.rag.qa.agent import QuestionAnswerAgent
async with HaikuRAG(path_to_db) as client:
# Choose a provider and model (see Configuration for env defaults)
agent = QuestionAnswerAgent(
client=client,
provider="openai", # or "ollama", "vllm", etc.
model="gpt-4o-mini",
use_citations=False, # set True to bias prompt towards citing sources
)
answer = await agent.answer("What is climate change?")
print(answer)
Deep QA Agent
Deep QA is a multi-agent system that decomposes complex questions into sub-questions, answers them in batches, evaluates sufficiency, and iterates if needed before synthesizing a final answer. It's lighter than the full research workflow but more powerful than the simple QA agent.
---
title: Deep QA graph
---
stateDiagram-v2
[*] --> plan
plan --> get_batch
get_batch --> search_one: Has questions (map)
get_batch --> synthesize: No questions
search_one --> collect_answers
collect_answers --> decide
decide --> get_batch: Continue QA
decide --> synthesize: Done with QA
synthesize --> [*]
Key nodes:
- plan: Decomposes the question into focused sub-questions using a presearch tool
- get_batch: Retrieves remaining sub-questions for the current iteration
- search_one: Answers a single sub-question using the knowledge base (mapped in parallel)
- collect_answers: Aggregates search results from parallel executions
- decide: Evaluates if sufficient information has been gathered or if more iterations are needed
- synthesize: Generates the final comprehensive answer from all gathered information
Key differences from Research:
- Simpler evaluation: Uses sufficiency check (not confidence + insight analysis)
- Direct answers: Returns just the answer (not a full research report)
- Question-focused: Optimized for answering specific questions, not open-ended research
- Supports citations: Can include inline source citations like
[document.md] - Configurable iterations: Control max_iterations (default: 2) and max_concurrency (default: 1)
Note on parallel execution:
- The search_one node is mapped over all questions in a batch
- Parallelism is controlled via max_concurrency
- All questions in an iteration are processed before evaluation
CLI usage:
# Deep QA without citations
haiku-rag ask "What are the main features of haiku.rag?" --deep
# Deep QA with citations
haiku-rag ask "What are the main features of haiku.rag?" --deep --cite
Python usage:
from haiku.rag.client import HaikuRAG
from haiku.rag.config import Config
from haiku.rag.graph.deep_qa.dependencies import DeepQAContext
from haiku.rag.graph.deep_qa.graph import build_deep_qa_graph
from haiku.rag.graph.deep_qa.state import DeepQADeps, DeepQAState
async with HaikuRAG(path_to_db) as client:
# Use global config (recommended)
graph = build_deep_qa_graph(config=Config)
context = DeepQAContext(
original_question="What are the main features of haiku.rag?",
use_citations=True
)
state = DeepQAState.from_config(context=context, config=Config)
deps = DeepQADeps(client=client)
result = await graph.run(
state=state,
deps=deps
)
print(result.answer)
print(result.sources)
Alternative usage with custom config:
# Create a custom config with different settings
from haiku.rag.config.models import AppConfig, QAConfig
custom_config = AppConfig(
qa=QAConfig(
provider="openai",
model="gpt-4o-mini",
max_sub_questions=5,
max_iterations=3,
max_concurrency=2,
)
)
graph = build_deep_qa_graph(config=custom_config)
context = DeepQAContext(
original_question="What are the main features of haiku.rag?",
use_citations=True
)
state = DeepQAState.from_config(context=context, config=custom_config)
deps = DeepQADeps(client=client)
result = await graph.run(state=state, deps=deps)
Research Graph
The research workflow is implemented as a typed pydantic‑graph. It plans, searches (in parallel batches), evaluates, and synthesizes into a final report — with clear stop conditions and shared state.
---
title: Research graph
---
stateDiagram-v2
[*] --> plan
plan --> get_batch
get_batch --> search_one: Has questions (map)
get_batch --> synthesize: No questions
search_one --> collect_answers
collect_answers --> analyze_insights
analyze_insights --> decide
decide --> get_batch: Continue research
decide --> synthesize: Done researching
synthesize --> [*]
Key nodes:
- plan: Builds up to 3 standalone sub‑questions (uses an internal presearch tool)
- get_batch: Retrieves remaining sub‑questions for the current iteration
- search_one: Answers a single sub‑question using the KB with minimal, verbatim context (mapped in parallel)
- collect_answers: Aggregates search results from parallel executions
- analyze_insights: Synthesizes fresh insights, updates gaps, and suggests new sub-questions
- decide: Checks sufficiency/confidence thresholds and determines whether to continue research
- synthesize: Generates a final structured research report
Primary models:
SearchAnswer— one per sub‑question (query, answer, context, sources)InsightRecord/GapRecord— structured tracking of findings and open issuesInsightAnalysis— output of the analysis stage (insights, gaps, commentary)EvaluationResult— insights, new questions, sufficiency, confidenceResearchReport— final report (title, executive summary, findings, conclusions, …)
Note on parallel execution:
- The search_one node is mapped over all questions in a batch
- Parallelism is controlled via max_concurrency
- Analysis and decision nodes process results after each batch completes
CLI usage:
# Basic usage (uses config from file or defaults)
haiku-rag research "How does haiku.rag organize and query documents?" --verbose
# With custom config file
haiku-rag --config my-research-config.yaml research "How does haiku.rag organize and query documents?" --verbose
Python usage (blocking result):
from haiku.rag.client import HaikuRAG
from haiku.rag.config import Config
from haiku.rag.graph.research.dependencies import ResearchContext
from haiku.rag.graph.research.graph import build_research_graph
from haiku.rag.graph.research.state import ResearchDeps, ResearchState
async with HaikuRAG(path_to_db) as client:
# Use global config (recommended)
graph = build_research_graph(config=Config)
question = "What are the main drivers and trends of global temperature anomalies since 1990?"
context = ResearchContext(original_question=question)
state = ResearchState.from_config(context=context, config=Config)
deps = ResearchDeps(client=client)
result = await graph.run(
state=state,
deps=deps,
)
report = result
print(report.title)
print(report.executive_summary)
Alternative usage with custom config:
from haiku.rag.config.models import AppConfig, ResearchConfig
custom_config = AppConfig(
research=ResearchConfig(
provider="openai",
model="gpt-4o-mini",
max_iterations=5,
confidence_threshold=0.85,
max_concurrency=3,
)
)
graph = build_research_graph(config=custom_config)
context = ResearchContext(original_question=question)
state = ResearchState.from_config(context=context, config=custom_config)
deps = ResearchDeps(client=client)
result = await graph.run(state=state, deps=deps)
Python usage (streamed AG-UI events):
from haiku.rag.graph.agui import stream_graph
from haiku.rag.client import HaikuRAG
from haiku.rag.config import Config
from haiku.rag.graph.research.dependencies import ResearchContext
from haiku.rag.graph.research.graph import build_research_graph
from haiku.rag.graph.research.state import ResearchDeps, ResearchState
async with HaikuRAG(path_to_db) as client:
graph = build_research_graph(config=Config)
question = "What are the main drivers and trends of global temperature anomalies since 1990?"
context = ResearchContext(original_question=question)
state = ResearchState.from_config(context=context, config=Config)
deps = ResearchDeps(client=client)
async for event in stream_graph(graph, state, deps):
if event["type"] == "STEP_STARTED":
print(f"Starting step: {event['stepName']}")
elif event["type"] == "ACTIVITY_SNAPSHOT":
# Activity events include structured data alongside messages
content = event['content']
print(f" {content['message']}")
# Different activity types have different structured fields
if 'confidence' in content:
print(f" Confidence: {content['confidence']:.0%}")
if 'sub_questions' in content:
for q in content['sub_questions']:
print(f" - {q}")
if 'insights' in content:
print(f" New insights: {len(content['insights'])}")
elif event["type"] == "RUN_FINISHED":
print("\nResearch complete!\n")
result = event["result"]
print(result["title"])
print(result["executive_summary"])