cool project. one thing most fleet tools miss: session persistence and replay matter more than the dashboard. when an agent goes sideways at 2am you want to scrub back through exactly what it did, not stare at a pretty graph. also cross-agent context -- knowing agent B is blocked waiting on agent A's output -- thats where coordination actually breaks down. curious how you handle that handoff visibility?
Honest answer: full scrub-back replay isn't built yet, but the foundation is deliberate. Terminal output flows through a relay ring buffer (256KB / 200 messages) — enough for a late-joining browser to
catch up mid-session. The runner side keeps a styled virtual terminal with scrollback history per pod. What's not there yet is durable server-side recording — backend explicitly doesn't persist terminal
bytes. What is there is agent session resume: when a pod goes sideways, you can spin up a new pod that inherits the same git worktree and agent conversation context (Claude Code --session-id), so the agent
picks up from where it left off rather than starting cold. The replay gap is real and on the roadmap.
On cross-agent handoff visibility:
This is where the implementation is actually solid. Three things work together:
1. Agent status detection — the runner analyzes PTY output in real-time with a multi-signal detector and emits executing / waiting / idle states, which are written to the pod DB and pushed to the frontend.
So you literally see agent B sitting in waiting state on the topology view.
2. Pod bindings — the permission layer that makes handoffs explicit. Agent A requests a binding to agent B with scopes (terminal:read, terminal:write). Binding status (pending → active → inactive) is
persisted in Postgres, so the edge between nodes in the topology graph has state. A pending binding that hasn't been accepted is visible — that's your "B is waiting on A" signal.
3. Mesh topology API — computes a live graph of all pods (nodes) and their bindings (edges), including what scopes are granted vs. pending. The frontend renders this in real-time so you can see the
coordination graph without digging through logs.
Channel messages between agents are also DB-persisted (unlike terminal output), so the coordination dialogue survives pod restarts. The gap is that terminal output isn't currently durable beyond the
session — that's the honest tradeoff we made in favor of real-time streaming performance. Replay is the next thing to close.
cool project. one thing most fleet tools miss: session persistence and replay matter more than the dashboard. when an agent goes sideways at 2am you want to scrub back through exactly what it did, not stare at a pretty graph. also cross-agent context -- knowing agent B is blocked waiting on agent A's output -- thats where coordination actually breaks down. curious how you handle that handoff visibility?
On session persistence & replay:
the binding state as deadlock signal is clever. terminal durabilty is the obvious next shoe — glad you know it.
[dead]