System Design
roam-code Architecture
roam-code is a local-first analysis system: parse source once, store structural facts in SQLite, and expose deterministic query primitives to CLI and MCP clients.
Design goal: minimize repeated file scanning for agents by turning codebases into a queryable structural index.
Architecture Diagram
Key property Query-time latency stays low because expensive parsing happens during index builds, not every command call.
Subsystem Responsibilities
| Subsystem | Main modules | Responsibility |
|---|---|---|
| Index Pipeline | index/indexer.py, index/parser.py, index/symbols.py | Build and refresh the structural index from source + git. |
| Storage | db/schema.py, db/connection.py | SQLite schema, migrations, batched query helpers. |
| Graph Intelligence | graph/builder.py, graph/layers.py, graph/clusters.py | Centrality, layering, communities, cycle analysis. |
| Rule Engine | rules/builtin.py, rules/engine.py, rules/ast_match.py | Built-in rules plus user-defined YAML packs (path/symbol/AST/dataflow). |
| Interfaces | commands/cmd_*.py, mcp_server.py | Expose deterministic queries to CLI and MCP clients. |
| Output Contracts | output/formatter.py, output/sarif.py | Stable text/JSON/SARIF envelopes for agents and CI. |
Index Pipeline Stages
- Discovery: collect tracked files and classify file roles.
- Parsing: tree-sitter parse per file with language routing.
- Extraction: symbols, signatures, docstrings, references.
- Resolution: convert references into graph edges.
- Metrics: complexity, centrality, churn, co-change, snapshots.
- Persistence: upsert into SQLite with incremental diffing.
discover -> parse -> extract -> resolve -> metrics -> persist
(incremental path executes only changed files)
Command to Data Flow
Example: roam preflight AuthService
CLI cmd_preflight
-> ensure_index()
-> query symbols/edges/metrics
-> run health/rule checks
-> aggregate verdict + risk factors
-> render text or JSON envelope
Tradeoff Static structure gives speed and determinism, but cannot model fully dynamic runtime behavior without trace ingestion.