Agent-to-Agent (A2A) Protocol

The A2A server exposes haiku.rag as a conversational agent using the Agent-to-Agent protocol. Unlike the MCP server which provides stateless tools, the A2A agent maintains conversation history and context across multiple turns.

Features

Conversational Context: Maintains full conversation history including tool calls and results
Multi-turn Dialogue: Supports follow-up questions with pronoun resolution ("he", "it", "that document")
Intelligent Search: Performs single or multiple searches depending on question complexity
Source Citations: Always includes sources with both titles and URIs
Full Document Retrieval: Can fetch complete documents on request
Multiple Skills: Exposes three distinct skills with appropriate artifacts:
document-qa: Conversational question answering (default)
document-search: Semantic search with structured results
document-retrieve: Fetch complete documents by URI

Starting A2A Server

haiku-rag serve --a2a

Server options: - --a2a-host - Host to bind to (default: 127.0.0.1) - --a2a-port - Port to bind to (default: 8000)

Example:

haiku-rag serve --a2a --a2a-host 0.0.0.0 --a2a-port 8080

Interactive A2A Client

Note

The interactive A2A client is an excellent way to do conversational research with haiku.rag.

Test and interact with haiku.rag's A2A server using the built-in interactive client:

haiku-rag a2aclient

Client options: - --url - Base URL of the A2A server (default: http://localhost:8000)

Example:

# Connect to local server
haiku-rag a2aclient

# Connect to remote server
haiku-rag a2aclient --url https://example.com:8000

The interactive client provides:

Rich markdown rendering of agent responses
Conversation context across multiple turns
Agent card discovery and display
Compact artifact summaries

Requirements

A2A support requires the a2a extra:

uv pip install 'haiku.rag[a2a]'

Python Usage

from pathlib import Path
from haiku.rag.a2a import create_a2a_app
import uvicorn

# Create A2A app
app = create_a2a_app(Path("database.lancedb"))

# Run with uvicorn
uvicorn.run(app, host="127.0.0.1", port=8000)

This installs the fasta2a package and its dependencies.

Architecture

The A2A agent uses:

FastA2A: Python framework implementing the A2A protocol
Pydantic AI: Agent framework with tool support
In-Memory Storage: Context and message history storage (persists during server lifetime)
Conversation State: Full pydantic-ai message history serialized in A2A context

Message History

The agent stores the complete conversation state including:

User prompts
Agent responses
Tool calls and their arguments
Tool return values

This enables the agent to:

Reference previous searches
Understand pronouns and context
Maintain coherent multi-turn conversations

Context Management

Each conversation is identified by a context_id. All messages within the same context share conversation history. This allows the agent to:

Remember what was discussed
Track which documents were already found
Provide contextual follow-up answers

Skills

The agent exposes three skills:

document-qa (default): Conversational question answering including follow-ups and multi-turn dialogue
document-search: Direct semantic search returning formatted results
document-retrieve: Fetch complete document content by URI

Artifacts

All operations create artifacts for traceability:

search_results: Created for each search_documents tool call
Contains query and array of SearchResult objects (content, score, document_title, document_uri)
document: Created for each get_full_document tool call
Contains complete document text
qa_result: Created for all document-qa operations
Contains question, answer, and skill identifier
Always created for Q&A, even when answering from conversation history without tools

Memory Management

To prevent memory growth, the server uses LRU (Least Recently Used) eviction:

Maximum 1000 contexts kept in memory (configurable via A2A_MAX_CONTEXTS)
When limit exceeded, least recently used contexts are automatically evicted

Configure via environment variable:

export A2A_MAX_CONTEXTS=1000

Security

By default, the A2A agent runs without authentication. For production deployments, you should add authentication.

Adding Authentication

The create_a2a_app() function accepts optional security parameters that declare authentication requirements in the agent card:

from haiku.rag.a2a import create_a2a_app

app = create_a2a_app(
    db_path,
    security_schemes={
        "apiKeyAuth": {
            "type": "apiKey",
            "in": "header",
            "name": "X-API-Key",
            "description": "API key authentication",
        }
    },
    security=[{"apiKeyAuth": []}],
)

This populates the agent card at /.well-known/agent-card.json so other agents can discover your authentication requirements.

Security Examples

Three working examples are provided in examples/a2a-security/:

API Key (apikey_example.py) - Simple header-based authentication
OAuth2 GitHub (oauth2_github.py) - GitHub Personal Access Token authentication
OAuth2 Enterprise (oauth2_example.py) - Full OAuth2 with JWT verification

Each example shows:

How to declare security in the agent card
How to implement authentication middleware
How to verify credentials