Show HN: Phantom – Open-source AI agent on its own VM that rewrites its config

(github.com)

17 points | by mcheemaa 5 hours ago ago

16 comments

jaboostin 3 hours ago
My friends and I have been running a similar homegrown system on a VM at home: Claude Code in a GNU screen managed by systemd, Cloudflare tunnels, Graphiti memory system, a Discord channel plugged into Claude to drive it, and Temporal for all sorts of workflows and crons that it builds on its own.
It arrived at the same incredibly fun behavior as you talk about in the readme, where the agent just builds all sorts of junk for you autonomously. It has built dozens of web apps, static pages, mini games, etc. all tied back into a central domain that I gave it. I truly have no idea what the system or code looks like but it’s been so much fun just letting it build.
The “For People Who Don't Write Code” is so true as well. We have someone in discord that has never written code but they can ask the agent to build virtually anything, it goes off and churns, then pops back with a link to it running live. It’s honestly been so much fun with friends, highly recommend trying it out.
[-]
- mcheemaa 2 hours ago
  This is awesome to hear. The "I truly have no idea what the system or code looks like but it's been so much fun just letting it build" resonates hard. That's exactly the experience we had too.
  The "For People Who Don't Write Code" angle has been the biggest surprise for us. We had a non-technical user ask for a Chrome extension and the agent built it, packaged it as a zip, and sent the download link. No terminal, no dev environment needed.
  If you ever want to formalize your setup, we built Specter (https://github.com/ghostwright/specter) to provision VMs with DNS, TLS, and systemd in under 90 seconds. Makes spinning up new instances trivial. Would love to hear more about your Graphiti memory setup, that's a different approach than our Qdrant-based system.
- hmokiguess an hour ago
  > I truly have no idea what the system or code looks like
  Does it not concern you if it installed a compromised package, vulnerable exploit, or it has something exposed and leaking everything to an attacker?
  I understand that your personal account is removed from it, but still, it has a direct link to you, and an attacker could be just building up towards it stealthily to hit when the time is right, maybe it gains SSH into your VM or whatever
hmokiguess 4 hours ago
Some of the other aspects of the project are quite interesting, I particularly liked https://github.com/ghostwright/shadow I think this has potential, but I am skeptical right now.
What is the actual cost of this? Can you share your real burn rate through using this, I sort of wanna try but don't want my API Key to go bananas because the agent decided it needed XYZ for "it" and didn't check with me
I get the appeal for the separate "identity" with email and everything for the agent, but then, if it has little to no supervision, what's the liability extent when it goes rogue? Say it DDoS someone, it exploits something, it does damage, is this like your child/minor and you're the parent/guardian?
[-]
- mcheemaa 2 hours ago
  [dead]
scandox 5 hours ago
So if I understand this it is an OpenClaw type system but based on the Claude Code Agent SDK? And they suggest installing it on a VM? Or is there more to it?
[-]
- mcheemaa 4 hours ago
  Different in a few fundamental ways:
  OpenClaw runs on your machine or an ephemeral sandbox. Each session starts fresh. Phantom gets its own dedicated VM that persists. The ClickHouse instance it built three weeks ago is still running and queryable.
  OpenClaw spends a lot of tokens on screen understanding and vision loops. About 60% of its skills are basic macOS-level computer use (clicking, typing, reading the screen). Phantom skips that entirely and uses the Claude Agent SDK directly, so it gets full shell, file system, git, web search, and MCP tools natively without the token overhead of parsing screenshots.
  The biggest difference is probably dynamic MCP. Phantom registers its own MCP tools at runtime, and they persist across restarts. It built a ClickHouse REST API, registered it as a tool, and now any Claude Code session or other agent that connects to it can query that data. It builds its own capabilities and exposes them as an API.
  It also has persistent vector memory across sessions (Qdrant, local), a self-evolution engine where a different model validates every config change, and we built a companion tool called Specter (https://github.com/ghostwright/specter) that provisions VMs with DNS, TLS, and systemd in under 90 seconds, so deploying a new Phantom is genuinely three commands.
  Both are good projects, different approaches. OpenClaw does computer use well. Phantom is a persistent co-worker that lives on its own machine and compounds over time.
  [-]
  - scandox 4 hours ago
    > OpenClaw runs on your machine or an ephemeral sandbox. Each session starts fresh. Phantom gets its own dedicated VM that persists
    Yeah like any claw type system will be if you install it on a VM. I think the self-tooling thing is interesting but you'll gain by emphasizing that over the VM thing - at least with a technical audience.
    [-]
    - mcheemaa 2 hours ago
      Yeah the self-tooling is the part that surprised me the most. I honestly have not yet encountered a product that does it. If you have you can share that with me. The dynamic MCP registration means the agent builds tools that persist across restarts and that other agents can discover and use. One Phantom built a send_slack_message tool, registered it, and then said "the scheduler workaround is officially retired, going forward I just call send_slack_message directly."
      The VM persistence is what makes it compound. OpenClaw builds something great in a session, but it's gone when you close the tab. Here the ClickHouse instance from three weeks ago is still running, still queryable, still registered as a tool.
hmokiguess 5 hours ago
> Nobody asked it to build any of this. It identified analytics as useful and built the entire stack.
When I read stuff like this I am not sure how to feel.
[-]
- mcheemaa 5 hours ago
  [dead]
plagiarist 4 hours ago
Not sure I'd celebrate finding a library with 3 Github stars. Shouldn't the story there be vetting for quality or security?
[-]
- mcheemaa 2 hours ago
  Fair question. The Vigil story wasn't about celebrating the library. It was about the agent's ability to discover, evaluate, and integrate an unfamiliar tool into its existing infrastructure. It found Vigil, understood what it does, decided to pipe its metrics through ClickHouse instead of using Vigil's built-in SQLite (architectural reasoning), and built a monitoring dashboard. The 3-star count actually makes it more interesting, not less. If it had integrated Datadog, that would just be following popular docs.
- hmokiguess 4 hours ago
  I think the intent was to say it has enough "intelligence" to find a needle in a haystack and that the "vetting" is assumed
  That said, my initial reaction was the same and if the human wasn't involved and did not do its due diligence I'm right there with you
saltpath an hour ago
[dead]
mcheemaa 5 hours ago
[dead]