Cairn - Repository Lifecycle Canvas

Storage Boundary

One committed policy, generated runtime state.

The repository keeps its documentation policy close to the code. The expensive derived artifacts stay local and can be regenerated.

Commit

.cairn/config.toml defines include/exclude globs, primary doc, MarkItDown policy, locale preference, and repo search diversity.
Source docs stay in normal repository paths: README, docs, specs, ADRs, PDFs.

Generate

.cairn/manifest.json records freshness and document states.
.cairn/documents/<doc_id>/ stores one normal Cairn index per source document.
Standalone inspector HTML can be generated for local review.

Connect

CLI uses the same repo index as MCP.
MCP stdio lets Claude Code, Cursor, Codex, Goose, and other clients query the repo locally.
Embedders and summarizers are provider plugins; fake providers keep smoke tests offline.

Research Lineage

The lifecycle is where the retrieval ideas become repo operations.

Cairn does not copy BookRAG, RAPTOR, or A-RAG as papers. It turns their useful retrieval patterns into a repository workflow with a stable CLI, generated indexes, and MCP tools.

BookRAG -> DocumentIndex

During sync, every source document becomes its own structure-aware index: section tree, summaries, entities, xrefs, and vectors. The document stays the primary retrieval unit instead of disappearing into anonymous chunks.

RAPTOR -> summaries

Cairn keeps the multi-level summary idea, but anchors it to the author's heading tree. Agents can start with gists and synopses, then expand only the sections that justify full text.

A-RAG -> MCP tools

The agentic part lives in the tool surface: repo discovery first, then document drilldown. Cairn exposes typed retrieval tools and lets the MCP client plan, inspect, and cite.

Cairn -> repo product

Repo config, manifest freshness, stale detection, per-document failure isolation, global hybrid ranking, and hit explanations are the engineering layer that makes those ideas usable in real repos.

Operational Loop

The repeatable workflow is small.

A maintainer should be able to explain the whole system in six commands. Agents see a stable tool surface; humans get local diagnostics before putting the index behind MCP.

Bootstrap

docsgraph init -y creates a conservative repo docs policy.

Build

docsgraph sync --fake indexes every discovered document with deterministic local providers.

Check

docsgraph status and docsgraph doctor show freshness, routing, and provider health.

Search

docsgraph query repo "..." mirrors the hybrid ranker; MCP clients can call repo_context for a full context pack.

Serve

docsgraph serve --fake starts a repo-scoped MCP server with structured envelopes.

Iterate

Edit docs, watch status become stale, sync again, and keep the committed policy stable.

Search Contract

Repo search finds the document; document tools prove the answer.

Cairn's repo mode is intentionally two-stage. First, search across the repository to pick candidate sections. Then, use normal document tools with doc to inspect exact structure and text.

Global ranker search_documents blends dense vectors, lexical field support, BM25-style sparse evidence, doc/path identity, and graph-neighborhood support.

Context pack repo_context composes ranked hits, compact section content, hit explanations, local relationships, and a relationship map in one MCP call.

Explainability Each hit carries signal scores, configured weights, matched terms, dominant signal, identity bonus, rank factor, and short notes.

Graph + impact repo_graph returns the docs relationship map. repo_impact reports derived artifacts and docs surfaces affected by document or section changes.

Diversity sections_per_doc defaults to the repo config so agents can discover the right document before going deep.

Drilldown get_section(doc=..., id=...), outline(doc=...), expand(doc=...), and read_range(doc=...) return exact slices with stable anchors.

Isolation Bad or incompatible documents are reported as skipped documents instead of taking down the whole repo query.

Product shape: Cairn is not a chat agent inside your repo. It is a local documentation graph and MCP tool server. The client decides how to reason; Cairn keeps discovery, ranking, evidence, and section retrieval grounded in the repository.

A repository becomes a navigable documentation graph.

The repo is the unit of operation.