Hi — author here.
One clarification:
The goal is not to let an AI freely control a computer.
I built a fixed local action skill library.
Each skill is a deterministic OS operation (open app, switch window, run command, structured input).
The model does not generate UI steps or mouse actions.
It only selects a skill.
The gateway executes it.
So the LLM is making decisions, not performing motor control.
The computer isn’t remotely driven by the model —
the model chooses from a constrained set of allowed actions.
This is mainly an experiment in making computer-using agents more predictable and auditable.
I’d especially value thoughts from people working on agent safety.
Another clarification since a few people messaged me privately:
This is not just a conceptual architecture — we actually tested it using the official Claude mobile app controlling a real desktop computer.
The phone runs the model inside the official app.
The app produces instructions in natural language.
Our gateway parses intent and maps it to a verified local action skill (keyboard/window/command primitives).
So the model is not embedded in the OS and not calling an API.
It is literally the mobile LLM app interacting with a real operating system through a constrained execution layer.
We were interested in whether an official consumer LLM app (without system privileges) could still reliably operate a computer when paired with a deterministic action layer.
Hi — author here. One clarification: The goal is not to let an AI freely control a computer. I built a fixed local action skill library. Each skill is a deterministic OS operation (open app, switch window, run command, structured input). The model does not generate UI steps or mouse actions. It only selects a skill. The gateway executes it. So the LLM is making decisions, not performing motor control. The computer isn’t remotely driven by the model — the model chooses from a constrained set of allowed actions. This is mainly an experiment in making computer-using agents more predictable and auditable. I’d especially value thoughts from people working on agent safety.
Another clarification since a few people messaged me privately: This is not just a conceptual architecture — we actually tested it using the official Claude mobile app controlling a real desktop computer. The phone runs the model inside the official app. The app produces instructions in natural language. Our gateway parses intent and maps it to a verified local action skill (keyboard/window/command primitives). So the model is not embedded in the OS and not calling an API. It is literally the mobile LLM app interacting with a real operating system through a constrained execution layer. We were interested in whether an official consumer LLM app (without system privileges) could still reliably operate a computer when paired with a deterministic action layer.