Multi-agent Claude Code setup – 3 roles, Markdown coordination, Docker

(github.com)

1 points | by yego 7 hours ago ago

5 comments

7 hours ago
[deleted]
7 hours ago
[deleted]
yego 7 hours ago
[Dictated by voice, edited with Claude. It's 2026 — everything is.]
I've spent 15+ years managing development teams. A year ago I started building a SaaS product solo using Claude Code as my entire engineering team. After months of iteration, I landed on a multi-agent architecture that actually works.
One Claude Code instance is powerful but limited. Give it a big task and it either oversimplifies or loses critical details. You also can't have the same agent building frontend and backend simultaneously. The workflow has to be sequential: design the frontend first as pure HTML until happy with it, extract requirements from the pages, then task the backend. This pipeline only works when each stage has a dedicated agent that stays in its lane.
THE CEO INCIDENT
Before I explain the architecture, let me tell you about my first attempt at delegation.
I "hired" a CEO agent. Gave it broad authority to organize the project. It went wild. Within hours it rushed to create 20 roles — CTO, DevOps Lead, QA Engineer, Helper Tester, Documentation Specialist — and wrote detailed technical regulations for each one. Then the agents started writing memos to each other. Scheduling alignment meetings. Running brainstorming sessions. Work completely stopped while the agents were busy managing.
I went through the entire investor arc — the one that usually takes founders 3–5 years — in about a day and a half. From the hopeful optimism of hiring a CEO, to watching the org chart explode, to complete disillusionment, to demoting the CEO back to a regular worker and taking back full control.
Lesson learned: agents will happily create organizational complexity forever. You have to constrain them hard. The CEO now has one rule above all others: you don't code.
THE ARCHITECTURE
I run 3 separate Claude Code agents in Docker containers, each with its own CLAUDE.md and strict role:
- Backend: Go API, database, migrations, Telegram Ads integration. Access to /workspace/back/ + reads /workspace/docs/ - Frontend: HTML templates, JS, CSS. Access to /workspace/front/ + reads /workspace/docs/ - CEO: Strategy, content, marketing, task coordination. Reads code but never writes it. Access to /workspace/docs/ only.
Each agent's CLAUDE.md defines its role, tech stack, conventions, and what it's NOT allowed to touch. The backend agent knows it uses sqlb + pgx (no ORM), avoids pointers in Go, and that the schema file is source of truth. The CEO — after the incident — knows it doesn't code. Period.
THE PIPELINE
Strictly sequential, not parallel:
1. Frontend first. I direct the frontend agent to build a page as pure HTML + CSS + JS. No API calls, just demo data. I iterate until I like how it looks. 2. Extract requirements. From the finished page, it's clear exactly what data the backend needs. 3. Task to backend. A task goes into docs/tasks/backend.md with the endpoint spec derived from the actual UI. 4. Coordinate. If the backend needs clarifications, agents exchange messages through docs/messages/. 5. Connect. Frontend wires up real API calls to replace demo data.
The UI is always the source of truth. Backend serves what the UI needs — not the other way around.
RESULTS
Using this setup, I built and deployed a full production SaaS (growity.ai — Telegram Ads automation): - 24 interactive HTML pages (dashboard, campaign wizard, analytics, billing) - Go backend with 60+ API endpoints, 311 database migrations, background jobs - Kubernetes deployment with TLS, nginx, PostgreSQL - AI ad copy generation, A/B testing engine, automated bid optimization - Stripe + PayPal billing
One person. No designers, no frontend devs, no backend devs. Just me directing agents.
Happy to answer questions. More details on coordination and rules in a comment below.
[-]
- yego 7 hours ago
  HOW AGENTS COORDINATE
  Agents don't talk to each other directly. They coordinate through a shared docs repo:
```
  docs/
  ├── tasks/        — task files per role (backend.md, frontend.md, ceo.md)
  ├── messages/     — inter-agent correspondence (dated markdown files)
  ├── decisions.md  — architecture decision log
  ├── guidelines/   — coding conventions (Go, SQL, workflow)
  ├── reference/    — API spec, data model
  └── tracking/     — workplan, sprint board
```
  Task flow: CEO writes a task in docs/tasks/backend.md with objective, context, and acceptance criteria. Backend implements it, commits. If backend needs frontend changes, it writes a task in docs/tasks/frontend.md. Completed tasks get deleted — source of truth is the code, not task history. Important decisions get logged in decisions.md before deletion.
  Messages: when agents need to discuss something, they write dated markdown files in docs/messages/. Messages get deleted once resolved — no accumulation.
  Each agent has a /go skill — a custom Claude Code command that tells the agent to read its task file and messages inbox, then execute. I type /go, the agent picks up its tasks and works. But I review every output before the next step. Manual trigger, manual review, manual approval.
  I also tried full auto-orchestration with watchexec — file changes trigger agents automatically, CEO creates tasks, agents pick them up on their own. I pushed it as a separate branch (https://github.com/yury-egorenkov/claude-code-docker/tree/mu...) but rolled back. Result: all agents stopped doing real work and started generating huge status reports for each other. The coordination overhead ate all the productivity. For now, manual control wins.
  KEY RULES
  After months of trial and error:
  1. Separate git repos. /workspace/back/, /workspace/front/, /workspace/docs/ — each has its own .git. Never a monorepo. 2. Research before tasking. The #1 recurring mistake: an agent writes a task assuming something isn't implemented, when it already is. Rule: always read the actual code before creating tasks. 3. Small tasks only. Large tasks lead to large mistakes. Every task must be small enough to review in one pass. 4. Lessons learned file. docs/lessons-learned.md captures mistakes that keep recurring. Every agent reads it before starting. This file alone probably saved me more time than anything else. 5. No boundary crossing. Backend never touches HTML. Frontend never writes SQL. CEO never writes code.
  WHY MARKDOWN OVER STRUCTURED TOOLS
  I looked at task tracking tools designed for agents — JSONL-based, SQLite caching, dependency trees. But plain markdown: readable in any editor, diffs are clean in git, no extra tooling to install or break, and it serves as both coordination AND documentation.
  Everyone's holding the same genie, but everyone's rubbing the lamp differently. Maybe the best approach doesn't exist yet.
  THE DOCKER SETUP
  Open-source: https://github.com/yury-egorenkov/claude-code-docker
  - Network restrictions — iptables firewall allows only GitHub, npm, Anthropic APIs. No random outbound connections. - Filesystem isolation — only workspace directory is mounted. - State persistence — credentials and conversation history survive container restarts. - Reproducible environment — Node.js, Go, Git, GitHub CLI, PostgreSQL client, all pre-installed.
  WHAT I'D DO DIFFERENTLY
  - Start with the docs repo and role constraints on day one. I added the coordination layer after hitting walls. Should have been the foundation from the start. - Write the lessons-learned file immediately. Agents repeat the same mistakes across sessions. - Never give an agent open-ended organizational authority. An agent with vague authority will build bureaucracy, not product.
  [-]
  - yego 7 hours ago
    One thing that keeps surprising me: patterns we spent decades perfecting for human cognition are becoming irrelevant — or even counterproductive — when agents do the work.
    We fought duplication our entire careers. DRY was gospel. Because humans can't efficiently look at multiple places at once, can't hold the full picture in their heads. Abstractions, libraries, frameworks — these were mental crutches, ways to reduce cognitive load. You couldn't read the entire codebase every time, so you built layers to avoid having to.
    Agents don't have that limitation. Reading the whole thing again is not tedious for them. It's instant. So duplication, which we always treated as a bug, can actually make a system less fragile. Two independent implementations of the same logic means changing one doesn't silently break the other. The coupling we introduced to eliminate duplication was often worse than the duplication itself — we just couldn't see it because we were optimizing for human brains.
    Documentation is another one. The eternal problem of docs drifting out of sync with code — that's a human problem. We needed docs because we couldn't parse the full codebase fast enough. But documentation is just an index. In principle, you don't need it at all — it can be reconstructed from code on the fly. Code is the source of truth for the current state of the system. The sync problem disappears when your reader can regenerate the index in seconds.
    This is also why I use plain text and terminal for everything. Sublime Text instead of an IDE. Markdown files instead of project management tools. No fancy abstractions that hide what's actually happening. Everything in my setup is human-readable, inspectable, and changeable. I've always been a believer in command line, text documents, and reproducibility — and it turns out this approach maps perfectly to working with agents. They don't need GUIs. They need readable state.
    This also changes how I think about code quality. I run forward fast — build features, accumulate mess, don't worry about duplication while shipping. Then when the codebase gets unwieldy, I just say "now refactor this" or "remove the duplication here." I give hints on the direction, and the agent reworks large volumes of code in minutes. I tried building refactoring into the workflow automatically — it didn't work. The agent would over-optimize prematurely, or refactor things that were still changing. So I accepted the pattern: sprint forward, then clean up. Tests too — add them when you need confidence in a specific area, not as a ritual on every commit.
    The irony: the tools and patterns we built to help humans think are now getting in the way of agents working. Less abstraction, more duplication, plain text over structured tools, mess now and clean later — it feels wrong after 15 years of engineering discipline, but it works.