I am using Spec Driven Development approach implemented as a Claude Code plugin since Feb for all mid + size tasks. The idea is to write detailed specs first using agent help doing research and interviewing, decompose the task into smaller subtasks, write detailed spec for each task, implement each task separately. You can restart the session after every step in the workflow and after each subtask implementation since all requirements are materialized in specs. This helps to keep session context focused on a single task at time, improve adherence, reduce cost and allow to implement bigger tasks that are hard to implement with pure plan + code.
Similar with a todo.md in the project which outlines work to be done.. this gets combined with developer and/or user documentation which outlines features and how they're expected to work. I'll iterate with the agent on the planning and documentation through several times until the documentation and plan look good. The only gotcha I've had a couple times is I'll have the testing and spec before implementation and sometimes the agent will try to edit tests rather than making the implementation match spec/tests.
I'm definitely baby sitting the process more than vibe coding, and review each cycle's results. As for languages, mostly TS/JS and Rust with a bit of C# here and there depending on what I need. Claude Code's Opus does a pretty good job with Rust, so for anything personal, I've just gone with it.
Work has been limited to working out specific problems, or a small utility/library that I can pull in, but on my own system, separate from work resources.
One additional benefit that we get from the sddw is that agent drives the spec creation using scenario we put into command/skill. It does the research local/web, it asks operator questions and later confirmations about each block in the spec.
I am building AI agents full time since Nov 2024. I stopped coding completely around mid summer 2025 using Cursor at that time. When you build platform-like application, and have few plugins already, ai coder can create next one in a way you won't recognize which one is written by you.
At the end of 2025 I switched to Claude Code. Compared to Cursor this opened a different level of automation, including fe possibility of running swarms of agents: https://news.ycombinator.com/item?id=48407998 using subscription limits.
So I spend all my time rather understanding how to squeeze everything possible from AI than myself. AI scales, I am not.
At the moment I predominantly work with Python and hence PyCharm as the main IDE. However, I've built this plugin https://plugins.jetbrains.com/plugin/31117-agent-cli to render agentic CLIs as an editor tab in PyCharm and also some notification hooks so I don't have to switch windows and it's easy to jump around the code while the agent is doing its work.
Besides that I have a collection of custom skills (plan for JIRA tickets, github PR creation, code review, etc), a set of MCPs (most are for internal tooling) and most of the time I use Claude Code.
I've shifted to a "slow code" approach with AI, treating it more like a design partner than a code generator.
I mostly do TDD with TypeScript. I write the test, write the code myself (sometimes with the help of LLM), and then hand it to the LLM. Instead of asking it to write things for me, I use it to find edge cases, check if it's leak-proof, and verify efficiency.
For architecture questions, I debate with it for a while. I almost never ask for code without conversing 4-5 times first to push back on its assumptions. It's the best rubber-ducking partner I've had.
I'm a contractor (AWS and web apps), so I get a lot of sometimes-ambiguous requests. I have a five-part workflow via Claude/Codex skills: discovery->implementation planning->implementation->verification->review
Each phase writes to `./.agents/plans/{plan-name}/` in the project root. All in Markdown. That way, the flow is agent-agnostic. Each phase artifact is immutable after being written.
More details:
First, I put all the information that I have (documents, client statements, any code, my own summary, etc.) into a document. Which I pass to the discovery planning skill.
The discovery phase more formally defines the project in terms of functional requirements, non-functional requirements, constraints, risks, and assumptions. This might take a few passes to get everything nailed down.
After that, I being a implementation planning phase using the discovery artifact (`discovery.md`). We define the work in terms of phases, where each phases has various tasks associated with it (all checkboxes). Again, usually requires a few passes.
After that, I have a clear idea of the work needed and can send an estimate to the client. Or, if it's a personal project, get started actually building it. I have another phase for actual implementation.
Verification and review are similarly defined. They can be done by any agent.
Genuinely curious - in your case, where do the requirements for what needs to be built come from?
In every project I've touched, business requirements are always the bottleneck - so I've never been able to wrap my head around what kind of requirements can be fed into a setup like this at high enough volume to justify it.
Have you been able to build anything substantial with AI factory itself? I have done some of these experiments myself on these sort of things and found they ended up often being less effective than using the latest tools in harnesses like claude code.
But curious if you've found it to be a big unlock. I have been doing some of this industrial engineering myself.
Throw tmux in and this is my exact setup as well. I'm looking into finding ways to orchestrate entire workflows across multiple agents in a harness-agnostic way (locally at first).
I can tick files in Vim, those get concatenated into a prompt. Along with a feature request. Plus an instructions file that tells the LLM how to reply. Plus my general "rules for good code" file, plus one "rules for good code" file per language involved, plus a project specific overview file. The LLM then answers with a list of changes it wants to make to the code. My tooling then applies those changes and I look at them via "git diff". If I like it, I commit. If not, I change one of the prompts and start the process again.
Instead of replying with code changes, the LLM can also decide to request more files. I wrote a little DSL for that.
I described the beginnings of this workflow last July:
Pi.dev when I hit enter on any container. Hot swap anthropic enterprise and openai and openrouter as needed.
Every container has the dev env already running for my current projects. Iterate, rarely use vim when needed, spec driven and have llm draft prs for me then I review.
I know the codebase in and out so what I want done is on bypass mode and then I review closer at the draft PR step before marking ready for the team.
Claude code + very opinionated type script. Try to push as much as possible as far left in the SDLF (types -> lint rules -> tests -> md) and try to improve the dev ex after every single PR.
I am like you were late to the AI party, and still find it hard to give up on coding and let the AI do everything, however i learned to trust the AI a little in the past few months.
Stanford University offered the course "CS146S: The Modern Software Developer" in Fall 2025. Check it out if interested. https://themodernsoftware.dev/
Lead Dev for a Security Company with a very strict AI policy.
Mostly Hand coded, using an agent in the browser (Claude / Corporate ChatGPT account) when necessary. I am aware we will fall behind using this methodology and have advocated for change, but I suppose it comes with the territory.
MacOS, Ghostty, Neovim, Pi (with a fair bit of customization to each). I'm relatively new to Pi after using Codex pretty heavily, but it's nice to be able to customize things to how I want.
I use OpenCode with a three agent combo (architect, developer, reviewer), as I've found it's crucial that different models write the code vs review it.
I feel it’s important that this should be mentioned at least once in a thread like this: none. I choose to program the old-fashioned way, and do not anticipate this changing in the foreseeable future, and believe that I’ll cope just fine in my niche; and if it becomes commercially unviable, well, I may no longer be interested in the field anyway.
I won’t go into any details on why here, because that would make it too much about me. There have been plenty of discussions of reasons, trade-offs, &c. Plenty of people are rejecting this stuff, for a wide variety of reasons.
But one thing I will say: if I were teaching someone to program, I would actively discourage them entirely from using AI stuff, even though it will seem to help. (I mean someone that wants to learn programming, not someone that just wants results and is not interested in programming as such.)
I’m already doing this with my school (givedirection.com) and you’re gonna have a hard time nailing this down because there’s no two similar set ups
Especially along the range of newbie to expert it’s extremely variable and you’re not gonna be able to pick one that rules them all
I would suggest you revamp your approach and have different courses for different types of people I had to split my course into a basic and an advanced and they are extremely different
Even within the advanced course fairly simple stuff like hosting your own LLMs seems to really be a stretch for a lot of people
1) Claude Desktop which includes Claude Code for Anthropic: https://claude.com/product/claude-code (alternatively the terminal based version; either way get the subscription)
3) OpenCode for a variety of models: https://opencode.ai/ (they also have a subscription, but this in particular also makes it really easy to connect to OpenRouter)
4) KiloCode is essentially the above, but for VSC derived editors: https://kilo.ai/ (I personally liked RooCode more, but that got retired)
More niche tooling options:
1) Zed is pretty good, though I saw some issues with their LSP Edits and found that connecting them to OpenCode through ACP worked better, still a cool editor: https://zed.dev/
3) I'm also writing a launcher to make running Claude Code with 3rd party providers earlier, early days still: https://ccode.kronis.dev/
Note: for anyone on Windows, if you install the terminal versions of the tools (Claude Code, Codex, OpenCode, ...), you probably want them inside of WSL so there's less confusion with file paths etc. that some models have.
In regards to actually using the tech:
- version control and maybe worktrees
- sub-agents are pretty nice to have, Claude Code also introduced support for longer running workflows
- throw as much tooling as possible at the project, like Oxlint, Oxfmt etc., for Python it might be Ruff and ty or Pyright or whatever
- throw as much testing as possible at the project, maybe require certain coverage or just have CLAUDE.md that nudges the models to write and run tests
- throw as many additional scripts at the project as you want, e.g. how you want the architecture to be laid out, max file length limits etc., whatever common tools don't cover
- some tools also support LSP, use those when possible
- pretty much all models will still output slop, though making fresh instances (even of the same model) review its output, e.g. 3 parallel sub-agents looking for critical/serious issues works pretty well, I just have a review loop that I make the models run before commits
- ideally you'd also test local instances of whatever you build (e.g. real PostgreSQL instance etc.), just so the dev loops are tighter and faster
I am using Spec Driven Development approach implemented as a Claude Code plugin since Feb for all mid + size tasks. The idea is to write detailed specs first using agent help doing research and interviewing, decompose the task into smaller subtasks, write detailed spec for each task, implement each task separately. You can restart the session after every step in the workflow and after each subtask implementation since all requirements are materialized in specs. This helps to keep session context focused on a single task at time, improve adherence, reduce cost and allow to implement bigger tasks that are hard to implement with pure plan + code.
Discussion on hn: https://news.ycombinator.com/item?id=48231575
Repo: https://github.com/sermakarevich/sddw
Slides: https://docs.google.com/presentation/d/1SjKXF7hkoqyiN9-3tBGY...
Similar with a todo.md in the project which outlines work to be done.. this gets combined with developer and/or user documentation which outlines features and how they're expected to work. I'll iterate with the agent on the planning and documentation through several times until the documentation and plan look good. The only gotcha I've had a couple times is I'll have the testing and spec before implementation and sometimes the agent will try to edit tests rather than making the implementation match spec/tests.
I'm definitely baby sitting the process more than vibe coding, and review each cycle's results. As for languages, mostly TS/JS and Rust with a bit of C# here and there depending on what I need. Claude Code's Opus does a pretty good job with Rust, so for anything personal, I've just gone with it.
Work has been limited to working out specific problems, or a small utility/library that I can pull in, but on my own system, separate from work resources.
One additional benefit that we get from the sddw is that agent drives the spec creation using scenario we put into command/skill. It does the research local/web, it asks operator questions and later confirmations about each block in the spec.
Naive question: how much time do you spend doing so vs. Doing the actual work yourself?
I am building AI agents full time since Nov 2024. I stopped coding completely around mid summer 2025 using Cursor at that time. When you build platform-like application, and have few plugins already, ai coder can create next one in a way you won't recognize which one is written by you.
At the end of 2025 I switched to Claude Code. Compared to Cursor this opened a different level of automation, including fe possibility of running swarms of agents: https://news.ycombinator.com/item?id=48407998 using subscription limits.
So I spend all my time rather understanding how to squeeze everything possible from AI than myself. AI scales, I am not.
At the moment I predominantly work with Python and hence PyCharm as the main IDE. However, I've built this plugin https://plugins.jetbrains.com/plugin/31117-agent-cli to render agentic CLIs as an editor tab in PyCharm and also some notification hooks so I don't have to switch windows and it's easy to jump around the code while the agent is doing its work.
Besides that I have a collection of custom skills (plan for JIRA tickets, github PR creation, code review, etc), a set of MCPs (most are for internal tooling) and most of the time I use Claude Code.
I've shifted to a "slow code" approach with AI, treating it more like a design partner than a code generator.
I mostly do TDD with TypeScript. I write the test, write the code myself (sometimes with the help of LLM), and then hand it to the LLM. Instead of asking it to write things for me, I use it to find edge cases, check if it's leak-proof, and verify efficiency.
For architecture questions, I debate with it for a while. I almost never ask for code without conversing 4-5 times first to push back on its assumptions. It's the best rubber-ducking partner I've had.
Personal plug: I wrote more about why/how I use AI to write slow, better code on my blog: https://nabraj.com/blog/ai-write-slow-better-code
I'm a contractor (AWS and web apps), so I get a lot of sometimes-ambiguous requests. I have a five-part workflow via Claude/Codex skills: discovery->implementation planning->implementation->verification->review
Each phase writes to `./.agents/plans/{plan-name}/` in the project root. All in Markdown. That way, the flow is agent-agnostic. Each phase artifact is immutable after being written.
More details:
First, I put all the information that I have (documents, client statements, any code, my own summary, etc.) into a document. Which I pass to the discovery planning skill.
The discovery phase more formally defines the project in terms of functional requirements, non-functional requirements, constraints, risks, and assumptions. This might take a few passes to get everything nailed down.
After that, I being a implementation planning phase using the discovery artifact (`discovery.md`). We define the work in terms of phases, where each phases has various tasks associated with it (all checkboxes). Again, usually requires a few passes.
After that, I have a clear idea of the work needed and can send an estimate to the client. Or, if it's a personal project, get started actually building it. I have another phase for actual implementation.
Verification and review are similarly defined. They can be done by any agent.
I use Claude Code, flow for reusable skills/prompts, and leaf for reading Markdown comfortably in the terminal.
- Claude Code
- flow: https://github.com/RivoLink/flow
- leaf: https://github.com/RivoLink/leaf
- GNOME Terminal
It's a pretty terminal-first workflow.
I type in a text box and tell the AI wat to do. Yea my tooling is just a text box. Like Google search is just a text box.
There's lots of ways. You have to upskill through the stages IMO. Write code, write w/ agent, write w/ multi agents, write w/orchestrators.
My way is to just run a giant AI agent factory engine and make the agents full flow do everything. (plan long term, write prd, task, review).
Here's ~4000 commits in last month as an example, i have about ~10k ish including private/work stuff? https://github.com/portpowered/you-agent-factory/commits/mai...
The premise when you get to full automation generally is you go full industral engineering:
1. watch overall flow, improve process via continuous improvement
2. work via checklists and gates.
3. replace process with mechanisms as much as possible (code > agents)
4. optimal throughput is continual testing and iteration (CI, CD), coverage, full e2e tests, mock everything, general best practices really.
decent blog: https://openai.com/index/harness-engineering/
general points:
- build lots of linters
- document literally everything (arch, prd, best practices in repo)
- too many agents at the same time makes lots of code conflicts, so need to consider architecture of code how to maximize concurrency.
Genuinely curious - in your case, where do the requirements for what needs to be built come from?
In every project I've touched, business requirements are always the bottleneck - so I've never been able to wrap my head around what kind of requirements can be fed into a setup like this at high enough volume to justify it.
Have you been able to build anything substantial with AI factory itself? I have done some of these experiments myself on these sort of things and found they ended up often being less effective than using the latest tools in harnesses like claude code.
But curious if you've found it to be a big unlock. I have been doing some of this industrial engineering myself.
MacOS, Ghostty, Tmux, Neovim, Workmux[1], OpenCode/Claude Code, and lots of markdowns.
1 - https://github.com/raine/workmux
Throw tmux in and this is my exact setup as well. I'm looking into finding ways to orchestrate entire workflows across multiple agents in a harness-agnostic way (locally at first).
I forgot to say, I use Tmux + Workmux
https://github.com/raine/workmux
Maybe you're interested in https://github.com/RivoLink/leaf. If so, I'd appreciate your feedback.
I did a similar workshop between Feb-April (1 hour zoom call on Wednesday, 3 hour hands-on in person every week)
Most of the participants has Windows laptop. (Except one with Mac)
We had suggested Linux on WSL2 and VSCode. (`uv` for python package management)
But realized that we were spending a LOT of time fighting the tools/combination. WSL2 + Windows filesystem + uv did not work well together.
For person with macOS - it was smooth sailing
If I do another batch, we'll use native `pip` and python (not uv) and I think then we won't need WSL2
DeepSeek and Mecha-AI as CLI coding agent for general architecture [1]
Sublime Text and a DeepSeek plugin for file by file cosmetic fixes
Nothing else. With these tools I am building apps like never before in minutes instead of months
[1] https://www.npmjs.com/package/mecha-ai
If you are teaching newbies, just get them into the Claude Code or Codex desktop apps.
For devs:
Claude, Codex and Cursor. All on the $20 subscription.
Then use Conductor for worktrees w/ Claude/Codex for mid-size tasks and code review.
Cursor for manual or small changes w/ Composer 2.5.
virgin project:
1/ spec driven dev (https://github.com/github/spec-kit)
2/ then degrade to multiple sessions (no worktrees) debugging various problems until its done
On UI Design (MacOS, Web):
1/ AI does a first pass. Try to give it style guidance on my own (colors, style, etc).
2/ Prompt ChatGPT.com with screenshots and ask for recommendations on how to make it better.
3/ Codex the changes (with minor edits)
4/ loop 2-3, ask Gemini for feedback too
a tmux session where every window is a claude code instance in a different checkout of the repo
and then an MCP+Channels system that let’s the claudes DM each other
plus the Telegram channel so one of the claudes can talk to me over text message
I wrote my own tooling around the raw LLMs:
I can tick files in Vim, those get concatenated into a prompt. Along with a feature request. Plus an instructions file that tells the LLM how to reply. Plus my general "rules for good code" file, plus one "rules for good code" file per language involved, plus a project specific overview file. The LLM then answers with a list of changes it wants to make to the code. My tooling then applies those changes and I look at them via "git diff". If I like it, I commit. If not, I change one of the prompts and start the process again.
Instead of replying with code changes, the LLM can also decide to request more files. I wrote a little DSL for that.
I described the beginnings of this workflow last July:
https://www.gibney.org/prompt_coding
Feels like an eternity ago. I think I will write a new blog post this July and describe how the workflow has evolved over the past year.
Something different that other folks might not have thought of: Robust multi-environment infra deploy scripts that leverage terraform + AWS SSO
I've found that converting stuff that's previously been very ops-cli heavy into very detailed skills has worked really really well.
I use Claude Opus 4.8 + Conductor as my daily driver
Claude Code and/or Codex from Ghostty/Terminal. You don't need to complicate it.
Self made TUI that just lists LXC containers.
I have a base container.
"A" to make a new instance.
Pi.dev when I hit enter on any container. Hot swap anthropic enterprise and openai and openrouter as needed.
Every container has the dev env already running for my current projects. Iterate, rarely use vim when needed, spec driven and have llm draft prs for me then I review.
I know the codebase in and out so what I want done is on bypass mode and then I review closer at the draft PR step before marking ready for the team.
I have a vibe coded script which creates a git worktree + zellij pane with a specific layout + a virtualenv per feature. "tmuxinator" style.
The zellij layout includes panes for OpenCode, a shell, a neovim, inotify tests, etc.
I cycle through the zellij sessions during agent prefills.
Claude code + very opinionated type script. Try to push as much as possible as far left in the SDLF (types -> lint rules -> tests -> md) and try to improve the dev ex after every single PR.
ChatGPT, request minimal necessary diff to make a specific change, review, ctrl+c, ctrl+v
I am like you were late to the AI party, and still find it hard to give up on coding and let the AI do everything, however i learned to trust the AI a little in the past few months.
Stanford University offered the course "CS146S: The Modern Software Developer" in Fall 2025. Check it out if interested. https://themodernsoftware.dev/
What is your impression of the course?
Lead Dev for a Security Company with a very strict AI policy.
Mostly Hand coded, using an agent in the browser (Claude / Corporate ChatGPT account) when necessary. I am aware we will fall behind using this methodology and have advocated for change, but I suppose it comes with the territory.
I don't think it's clear you'll fall behind. Your competitors could very well be vibe coding themselves into messes they will never recover from.
MacOS, Ghostty, Neovim, Pi (with a fair bit of customization to each). I'm relatively new to Pi after using Codex pretty heavily, but it's nice to be able to customize things to how I want.
Currently using Arch Linux with VsCode and as server, I am currently going for vercel for no cost.
My stack is really boring, just VSCode + Ghostty and Claude Code team plan (premium seat).
Codex pretty much the only tool I use now
I use OpenCode with a three agent combo (architect, developer, reviewer), as I've found it's crucial that different models write the code vs review it.
More details here:
https://www.stavros.io/posts/how-i-write-software-with-llms/
So, you don’t have any experience in it but want to run a workshop?
I'm a bit of a fanboy, but exe.dev + their Shelley web agent is pretty great
Why do you like exe.dev over other providers? Just curious, I haven't tried any yet but I am interested.
One is the sword (claude code) one is the shield (codex)
I feel it’s important that this should be mentioned at least once in a thread like this: none. I choose to program the old-fashioned way, and do not anticipate this changing in the foreseeable future, and believe that I’ll cope just fine in my niche; and if it becomes commercially unviable, well, I may no longer be interested in the field anyway.
I won’t go into any details on why here, because that would make it too much about me. There have been plenty of discussions of reasons, trade-offs, &c. Plenty of people are rejecting this stuff, for a wide variety of reasons.
But one thing I will say: if I were teaching someone to program, I would actively discourage them entirely from using AI stuff, even though it will seem to help. (I mean someone that wants to learn programming, not someone that just wants results and is not interested in programming as such.)
Cool!
This thread is meant for people who use AI, though.
I’m already doing this with my school (givedirection.com) and you’re gonna have a hard time nailing this down because there’s no two similar set ups
Especially along the range of newbie to expert it’s extremely variable and you’re not gonna be able to pick one that rules them all
I would suggest you revamp your approach and have different courses for different types of people I had to split my course into a basic and an advanced and they are extremely different
Even within the advanced course fairly simple stuff like hosting your own LLMs seems to really be a stretch for a lot of people
OpenCode + their Go subscription.
Start with a nice batteries included setup, read anthropic's knowledge share, play and iterate, stay human in the loop.
Check out Dax Raad (behind OC) on the Pragmatic Engineer podcast, I think you will like his philosophies, I sure do.
The simplest mainstream options for tools:
1) Claude Desktop which includes Claude Code for Anthropic: https://claude.com/product/claude-code (alternatively the terminal based version; either way get the subscription)
2) Codex for OpenAI: https://developers.openai.com/codex/app (same as above, subscription preferred instead of paying per token)
3) OpenCode for a variety of models: https://opencode.ai/ (they also have a subscription, but this in particular also makes it really easy to connect to OpenRouter)
4) KiloCode is essentially the above, but for VSC derived editors: https://kilo.ai/ (I personally liked RooCode more, but that got retired)
More niche tooling options:
1) Zed is pretty good, though I saw some issues with their LSP Edits and found that connecting them to OpenCode through ACP worked better, still a cool editor: https://zed.dev/
2) If you have to pay for tokens and can't get subscriptions, look at DeepSeek as a provider (V4 Pro with Max reasoning): https://api-docs.deepseek.com/quick_start/pricing
3) I'm also writing a launcher to make running Claude Code with 3rd party providers earlier, early days still: https://ccode.kronis.dev/
Note: for anyone on Windows, if you install the terminal versions of the tools (Claude Code, Codex, OpenCode, ...), you probably want them inside of WSL so there's less confusion with file paths etc. that some models have.
In regards to actually using the tech: