Show HN: AgentsMesh – AI agent fleet command center

(github.com)

2 points | by zyf1994 7 hours ago ago

4 comments

jlongo78 6 hours ago

cool project. one thing most fleet tools miss: session persistence and replay matter more than the dashboard. when an agent goes sideways at 2am you want to scrub back through exactly what it did, not stare at a pretty graph. also cross-agent context -- knowing agent B is blocked waiting on agent A's output -- thats where coordination actually breaks down. curious how you handle that handoff visibility?

[-]

zyf1994 2 hours ago

On session persistence & replay:

  Honest answer: full scrub-back replay isn't built yet, but the foundation is deliberate. Terminal output flows through a relay ring buffer (256KB / 200 messages) — enough for a late-joining browser to     
  catch up mid-session. The runner side keeps a styled virtual terminal with scrollback history per pod. What's not there yet is durable server-side recording — backend explicitly doesn't persist terminal   
  bytes. What is there is agent session resume: when a pod goes sideways, you can spin up a new pod that inherits the same git worktree and agent conversation context (Claude Code --session-id), so the agent
   picks up from where it left off rather than starting cold. The replay gap is real and on the roadmap.                                                                                                       

  On cross-agent handoff visibility:

  This is where the implementation is actually solid. Three things work together:

  1. Agent status detection — the runner analyzes PTY output in real-time with a multi-signal detector and emits executing / waiting / idle states, which are written to the pod DB and pushed to the frontend.
   So you literally see agent B sitting in waiting state on the topology view.
  2. Pod bindings — the permission layer that makes handoffs explicit. Agent A requests a binding to agent B with scopes (terminal:read, terminal:write). Binding status (pending → active → inactive) is
  persisted in Postgres, so the edge between nodes in the topology graph has state. A pending binding that hasn't been accepted is visible — that's your "B is waiting on A" signal.
  3. Mesh topology API — computes a live graph of all pods (nodes) and their bindings (edges), including what scopes are granted vs. pending. The frontend renders this in real-time so you can see the
  coordination graph without digging through logs.

  Channel messages between agents are also DB-persisted (unlike terminal output), so the coordination dialogue survives pod restarts. The gap is that terminal output isn't currently durable beyond the
  session — that's the honest tradeoff we made in favor of real-time streaming performance. Replay is the next thing to close.

[-]

jlongo78 2 hours ago
the binding state as deadlock signal is clever. terminal durabilty is the obvious next shoe — glad you know it.

zyf1994 7 hours ago
[dead]