System Design

roam-code Architecture

roam-code is a local-first analysis system: parse source once, store structural facts in SQLite, and expose deterministic query primitives to CLI and MCP clients.

Design goal: minimize repeated file scanning for agents by turning codebases into a queryable structural index.

Architecture Diagram

Repository Inputs Source files Git history / blame Index Pipeline discover -> parse -> symbols edges -> complexity -> metrics incremental O(changed) Storage Layer .roam/index.db (SQLite) files, symbols, edges metrics, snapshots, vulns Analysis Engines Graph analytics (PageRank, SCC, Louvain) Rules engine (built-in + YAML packs) Security, smells, trends, traces Interfaces CLI commands (139 canonical) MCP tools/resources/prompts JSON/SARIF/text envelopes Consumers Developers (terminal + CI) AI agents via MCP (Claude, Cursor, Codex CLI) Automation (hooks, workflows, dashboards)

Key property Query-time latency stays low because expensive parsing happens during index builds, not every command call.

Subsystem Responsibilities

SubsystemMain modulesResponsibility
Index Pipelineindex/indexer.py, index/parser.py, index/symbols.pyBuild and refresh the structural index from source + git.
Storagedb/schema.py, db/connection.pySQLite schema, migrations, batched query helpers.
Graph Intelligencegraph/builder.py, graph/layers.py, graph/clusters.pyCentrality, layering, communities, cycle analysis.
Rule Enginerules/builtin.py, rules/engine.py, rules/ast_match.pyBuilt-in rules plus user-defined YAML packs (path/symbol/AST/dataflow).
Interfacescommands/cmd_*.py, mcp_server.pyExpose deterministic queries to CLI and MCP clients.
Output Contractsoutput/formatter.py, output/sarif.pyStable text/JSON/SARIF envelopes for agents and CI.

Index Pipeline Stages

  1. Discovery: collect tracked files and classify file roles.
  2. Parsing: tree-sitter parse per file with language routing.
  3. Extraction: symbols, signatures, docstrings, references.
  4. Resolution: convert references into graph edges.
  5. Metrics: complexity, centrality, churn, co-change, snapshots.
  6. Persistence: upsert into SQLite with incremental diffing.
discover -> parse -> extract -> resolve -> metrics -> persist
                 (incremental path executes only changed files)

Command to Data Flow

Example: roam preflight AuthService

CLI cmd_preflight
  -> ensure_index()
  -> query symbols/edges/metrics
  -> run health/rule checks
  -> aggregate verdict + risk factors
  -> render text or JSON envelope

Tradeoff Static structure gives speed and determinism, but cannot model fully dynamic runtime behavior without trace ingestion.