Post on Tuesday or Wednesday, 8-9am EST
Title is just: "The Uncovenanted Agent Problem"
Replace [GitHub link] with actual repo URL when live under Kova name
First comment should be from you: brief context on who you are and why you built this
Respond to EVERY comment in the first 6 hours
Don't be defensive. Thank critics. Ask follow-up questions.
If someone finds a real flaw, acknowledge it publicly and say you'll address it
DO NOT mention your age unless directly asked. Let the work speak.
"
> DO NOT mention your age unless directly asked. Let the work speak.
Fair catch. Those are internal notes from all of my earlier iterations of the project. I should have cleaned the docs folder before pushing to a new repo. Removing them now.
On the age thing: those notes have been written when I was just planning to let the work stand on its own. I had decided to include it because I figured being upfront is better than people finding out later and feeling misled. Either way, the code is the code — 134K lines, 6,115 tests, all on npm. Judge it on that.
An AI agent is doing some actions. Those actions must comply with "controls" like 'ALLOW transfer ON treasury WHERE amount <= 10000 AND currency = "USDC"' and provide public, auditable proof that actions complied with the spec. The action log seems to be verifiable via ZK proofs.
What's the application here? If you want to enforce that an agent's blockchain transactions follow some deterministic conditions, why not just give it access to a command-line tool (MCP / skill / whatever) that enforces your conditions?
If you want auditing of the agent's blockchain actions to be public, why not just make all your agent's actions go through an ordinary smart contract?
I don't mean to kill your enthusiasm for programming or AI. But this project...I'm sorry, but this project just isn't good. It's an over-engineered, vibe-coded "solution" in search of a problem.
This project is about a month old. I highly doubt one person produced 134 kloc in that time. I'm pretty sure a lot of it is vendorized dependencies and AI-generated code that's had minimal human review. Much of the documentation appears to be AI-generated as well.
On "why not a CLI tool / smart contract,” for single-agent, single-system setups, you are completely right. Nobulex is for when a third party has to verify compliance independently across systems. But the current examples don't make that clear enough.
On the code, yes, heavily AI-assisted. I designed the architecture, AI helped implement it. I am 15 and in school, no team. The project has been through many iterations over several months, not one month.
On "solution in search of a problem,” maybe. What would you consider worth solving in this space?
How is this different from building a CLI tool that allows/disallows certain behaviors?
For example, I have a Gmail CLI that just wraps the Gmail API and I specifically give AI certain powers and withhold other abilities. I log every action taken.
Is this a meta framework for this or an NPM package that does something like that?
Your Gmail CLI is doing the right thing with manually restricting what the agent will be able to do and logging actions. Nobulex is more of the generalized version of that pattern.
The difference: your CLI controls one agent on one tool with rules you have hardcoded. Nobulex gives you signed, immutable constraints that third parties can verify independently. The logs are hash-chained so nobody (including you) can tamper with them after the fact. And the constraints are the cryptographically bound to the agent's identity.
If you are truly the only one who needs to trust your agent, your approach works fine. Nobulex matters when someone else needs to verify what your agent has done, a regulator, a customer and a counterparty.
Based off of all the feedback here, I have built a quick demo that shows the multi-party use case on why cryptography matters when a third party needs to verify compliance independently:
Three steps: operator creates a covenant, claims compliance and then a regulator verifies the cryptographic proof without trusting the operator. That is the core of what Nobulex does. Everything else is tooling around this pattern. Appreciate the pushback, as it helped clarify what actually matters.
if you have pre-execution enforcement, what's the point of the verification protocol? the ability to apply stricter covenants to past action logs after the fact? i'm not sure i follow the use-case for that.
The Enforcement and verification serve for a different audience.
Enforcement will protect you as it stops your agent from doing something it shouldn't. Verification protects everyone else, as it lets a third party independently confirm that the enforcement actually happened, without trusting you.
You say "my agent followed the rules," while the regulator says "prove it." The hash-chained logs and signed covenants are the proof. Without verification, it's just your word.
makes sense. the core modules that i looked at look pretty good. (action-log, verifier, composability, dsl and parser).
all the kitchen sink stuff makes it pretty intense though. have you considered separating out just the core execution, logging and verification components? stuff like c2pa seems super cool, but maybe a second layer for application type things like that so that the core consensus stuff can be inspected easily? one goal for a system like this is easy auditability of the system itself.
That is exactly the direction I'm heading based on feedback from this thread. The core primitives (action-log, verifier, covenant DSL, parser) as a small, auditable package. Everything else — c2pa, otel, langchain, compliance adapters as a separate layer that builds on top.
You are right that auditability of the system itself is the goal. Its very hard to trust a trust layer you can't easily inspect. Appreciate you digging deep into the code.
I am 15 (sophomore in high school). I have been building Nobulex for the past several months, including 60 npm packages, 134K lines of TypeScript and 6,115 tests.
The problem: AI agents are making real decisions for loans, trades, hiring, diagnostics with zero cryptographic proof of what they have done or whether they followed any rules. The EU AI Act requires tamper-evident audit trails by August 2026. Nobody has infrastructure for this.
Nobulex is three things:
Agents will be able to sign behavioral covenants before they act (cryptographic commitments — "I will not do X")
Middleware enforces those covenants at runtime as violations are blocked before execution
Every action is logged in a hash-chained, merkle-tree audit trail that anyone can use and verify independently
It looks almost entirely envisioned and implemented by AI.
An agent signing a covenant doesn't do anything. You're not going to enforce a contract against it, and there's not some kind of non-repudiation problem to solve.
Enforcing behavioral covenants or boundaries is inherent to how you make things safe. But how do you really do it for anything that matters? How do you make sure that an agent isn't discriminating based on race or other factors?
The whole reason you're using an LLM is because you're doing something either:
A) at very low scale, at which case it's hard to capture sufficient covenants cost-efficiently
or B) with very great complexity, where the behavior you want is hard to encapsulate in code-- in which case meaningful enforcement of the complex covenants that may result is hard.
Indeed, if you could just write code to do it, you'd just write code to do it.
I'm glad you're interested in these issues and playing with them. I'll leave you with one last thought: 134 KSLOC is a bug, not a feature. Some software systems that need to be huge, but for software systems that need to be trusted-- small, auditable, and understandable to humans (and agents) is the key thing you're looking for. Could you build some kind of small trustable core that solves a simple problem in an understandable way?
You're right with the 134K point. The actual cryptographic kernel (covenant building, verification, hash-chaining) is just about 3-4K lines. The rest are just adapters, plugins and test harnesses. I should lead with that number.
With enforcement, the covenant itself isn't the enforcement. Middleware intercepts tool calls before the execution and blocks the violations. But you're right that this only works for constraints you can express as rules. "No external calls" and "rate limit 100/hour" are enforceable. "Don't discriminate" is not — that's a fundamentally harder problem and I'm not pretending that it solves it.
The small trustable core advice is truly good and probably what I should focus on next. Thank you.
Very fair question. If you control the whole stack with your agent, your middleware and your logs, then cryptography doesn't add much. You already trust yourself.
But, it matters when there are multiple parties. An enterprise deploys an agent that can handle customer data. The customer wants proof the agent has followed the rules. The regulator wants proof that the logs were not just edited after an incident. Without cryptographic signatures and hash chains, the enterprise can just say "trust us." With them, the proof is independently verifiable.
It's just the difference between "we followed the rules" and "here's a mathematically verifiable proof we followed the rules." For internal use, it's an overkill. For anything with external accountability, that targets the point.
There's no mathematically verifiable proof that anyone followed the rules. There's a cryptographic chain, but it just means "this piece of the stack, at some point, was convinced to process this and recorded that it did this." -- not whether that actually happened, what code was running, etc.
It doesn't tell you anything about what code was running there or whether it was really enforced.
Look, it's cool that this is an area that interests you. But I want you to know that AI agents are sycophantic and will claim your ideas are good and will not necessarily steer you in good directions. I have patents in the area of non-repudiation dating back 25 years and am doing my best to give you good feedback.
Non-repudiation, policy enforcement, audit-readiness, ledgers: these are all good things. As far as I can tell, there's nothing too special about doing this with LLMs, too. The same kinds of code that a bank uses to ensure that its ledger isn't tampered with and that the right software is running in the right places would work for this job -- and it wasn't vibe coded and mostly specified by AI.
You’re correct. The cryptographic chain proves “this middleware has processed this action and has recorded it,” not that the enforcement logic itself was correct or that the code running was what you think it was. Those are both different guarantees and I have been conflating them.
On “nothing too special about doing this with LLMs,” also fair. The primitives (policy enforcement, audit trails, non-repudiation) aren’t new. The bet is that AI agents will need these at a scale and standardization level that does not exist yet, and having it as a composable library matters when every framework (LangChain, CrewAI, Vercel AI SDK) is building agents differently. But the underlying cryptography isn’t novel.
Proving policy controls are in place and that actions were taken is a fairly universal problem.
Cryptography doesn't really do as much to improve it as one would think. Yes, providing evidence of sequence or that stuff happened before a certain time is a helpful tool to have in the toolbox.
The earliest human writings date to about 3000-3500 BCE, and are almost entirely ledgers on clay tablets.
I want to point out a little asymmetry. It's a little rude to generate a bunch of stuff, including writing, using LLMs, and then expect actual humans to interact with it. If it wasn't your time to do and understand and say, why should it be worth others' time to read and respond to it?
You also published all strategy documents, all internal project plans, even the marketing/adoption plan.
https://github.com/nobulexdev/nobulex/blob/main/docs/crisis-...
"HN Posting Notes
Internal only. Delete before posting.
When posting UNCOVENANTED-AGENT-PROBLEM-HN.md:
"> DO NOT mention your age unless directly asked. Let the work speak.
I'd agree. Why does the age matter.
it's OK, the commit "remove internal strategy docs" has definitely removed them
Fair catch. Those are internal notes from all of my earlier iterations of the project. I should have cleaned the docs folder before pushing to a new repo. Removing them now. On the age thing: those notes have been written when I was just planning to let the work stand on its own. I had decided to include it because I figured being upfront is better than people finding out later and feeling misled. Either way, the code is the code — 134K lines, 6,115 tests, all on npm. Judge it on that.
HN loves the idea of kid geniuses
An AI agent is doing some actions. Those actions must comply with "controls" like 'ALLOW transfer ON treasury WHERE amount <= 10000 AND currency = "USDC"' and provide public, auditable proof that actions complied with the spec. The action log seems to be verifiable via ZK proofs.
What's the application here? If you want to enforce that an agent's blockchain transactions follow some deterministic conditions, why not just give it access to a command-line tool (MCP / skill / whatever) that enforces your conditions?
If you want auditing of the agent's blockchain actions to be public, why not just make all your agent's actions go through an ordinary smart contract?
I don't mean to kill your enthusiasm for programming or AI. But this project...I'm sorry, but this project just isn't good. It's an over-engineered, vibe-coded "solution" in search of a problem.
This project is about a month old. I highly doubt one person produced 134 kloc in that time. I'm pretty sure a lot of it is vendorized dependencies and AI-generated code that's had minimal human review. Much of the documentation appears to be AI-generated as well.
On "why not a CLI tool / smart contract,” for single-agent, single-system setups, you are completely right. Nobulex is for when a third party has to verify compliance independently across systems. But the current examples don't make that clear enough. On the code, yes, heavily AI-assisted. I designed the architecture, AI helped implement it. I am 15 and in school, no team. The project has been through many iterations over several months, not one month. On "solution in search of a problem,” maybe. What would you consider worth solving in this space?
How is this different from building a CLI tool that allows/disallows certain behaviors?
For example, I have a Gmail CLI that just wraps the Gmail API and I specifically give AI certain powers and withhold other abilities. I log every action taken.
Is this a meta framework for this or an NPM package that does something like that?
Your Gmail CLI is doing the right thing with manually restricting what the agent will be able to do and logging actions. Nobulex is more of the generalized version of that pattern.
The difference: your CLI controls one agent on one tool with rules you have hardcoded. Nobulex gives you signed, immutable constraints that third parties can verify independently. The logs are hash-chained so nobody (including you) can tamper with them after the fact. And the constraints are the cryptographically bound to the agent's identity.
If you are truly the only one who needs to trust your agent, your approach works fine. Nobulex matters when someone else needs to verify what your agent has done, a regulator, a customer and a counterparty.
Based off of all the feedback here, I have built a quick demo that shows the multi-party use case on why cryptography matters when a third party needs to verify compliance independently:
https://github.com/nobulexdev/nobulex/blob/main/demo/two-par...
Run it: npx tsx demo/two-party-verify.ts
Three steps: operator creates a covenant, claims compliance and then a regulator verifies the cryptographic proof without trusting the operator. That is the core of what Nobulex does. Everything else is tooling around this pattern. Appreciate the pushback, as it helped clarify what actually matters.
> I have built a quick demo
This is obvs 5 minutes of LLM generated code
> a regulator verifies the cryptographic proof without trusting the operator.
No, the regulator verifies that the operator signed the proof, which isn't a lot different from the operator saying it alone.
if you have pre-execution enforcement, what's the point of the verification protocol? the ability to apply stricter covenants to past action logs after the fact? i'm not sure i follow the use-case for that.
Good question.
The Enforcement and verification serve for a different audience.
Enforcement will protect you as it stops your agent from doing something it shouldn't. Verification protects everyone else, as it lets a third party independently confirm that the enforcement actually happened, without trusting you. You say "my agent followed the rules," while the regulator says "prove it." The hash-chained logs and signed covenants are the proof. Without verification, it's just your word.
makes sense. the core modules that i looked at look pretty good. (action-log, verifier, composability, dsl and parser).
all the kitchen sink stuff makes it pretty intense though. have you considered separating out just the core execution, logging and verification components? stuff like c2pa seems super cool, but maybe a second layer for application type things like that so that the core consensus stuff can be inspected easily? one goal for a system like this is easy auditability of the system itself.
That is exactly the direction I'm heading based on feedback from this thread. The core primitives (action-log, verifier, covenant DSL, parser) as a small, auditable package. Everything else — c2pa, otel, langchain, compliance adapters as a separate layer that builds on top.
You are right that auditability of the system itself is the goal. Its very hard to trust a trust layer you can't easily inspect. Appreciate you digging deep into the code.
I am 15 (sophomore in high school). I have been building Nobulex for the past several months, including 60 npm packages, 134K lines of TypeScript and 6,115 tests.
The problem: AI agents are making real decisions for loans, trades, hiring, diagnostics with zero cryptographic proof of what they have done or whether they followed any rules. The EU AI Act requires tamper-evident audit trails by August 2026. Nobody has infrastructure for this.
Nobulex is three things:
Agents will be able to sign behavioral covenants before they act (cryptographic commitments — "I will not do X")
Middleware enforces those covenants at runtime as violations are blocked before execution
Every action is logged in a hash-chained, merkle-tree audit trail that anyone can use and verify independently
The quickstart is 3 lines: const { protect } = require('@nobulex/sdk'); const agent = await protect({ name: 'my-agent', rules: ['no-data-leak', 'read-only'] });
npm install @nobulex/sdk
Everything is MIT licensed and on npm under @nobulex/*. Site: https://nobulex.com
Would love feedback on the architecture, the covenant model, or anything else. Happy to answer questions.
It looks almost entirely envisioned and implemented by AI.
An agent signing a covenant doesn't do anything. You're not going to enforce a contract against it, and there's not some kind of non-repudiation problem to solve.
Enforcing behavioral covenants or boundaries is inherent to how you make things safe. But how do you really do it for anything that matters? How do you make sure that an agent isn't discriminating based on race or other factors?
The whole reason you're using an LLM is because you're doing something either:
A) at very low scale, at which case it's hard to capture sufficient covenants cost-efficiently
or B) with very great complexity, where the behavior you want is hard to encapsulate in code-- in which case meaningful enforcement of the complex covenants that may result is hard.
Indeed, if you could just write code to do it, you'd just write code to do it.
I'm glad you're interested in these issues and playing with them. I'll leave you with one last thought: 134 KSLOC is a bug, not a feature. Some software systems that need to be huge, but for software systems that need to be trusted-- small, auditable, and understandable to humans (and agents) is the key thing you're looking for. Could you build some kind of small trustable core that solves a simple problem in an understandable way?
You're right with the 134K point. The actual cryptographic kernel (covenant building, verification, hash-chaining) is just about 3-4K lines. The rest are just adapters, plugins and test harnesses. I should lead with that number. With enforcement, the covenant itself isn't the enforcement. Middleware intercepts tool calls before the execution and blocks the violations. But you're right that this only works for constraints you can express as rules. "No external calls" and "rate limit 100/hour" are enforceable. "Don't discriminate" is not — that's a fundamentally harder problem and I'm not pretending that it solves it. The small trustable core advice is truly good and probably what I should focus on next. Thank you.
Why does whether the agent "commits" to a rule cryptographically matter?
Surely it's just the enforcement, and maybe the measuring of sentinel events -- how far does it wander off course.
How is cryptography an important part of this, given that we're talking about a layer that sits on top of an LLM without an adversary in-between?
I know you mention non-repudiation, but ... there's no kind of real non-repudiation here in this environment.
Very fair question. If you control the whole stack with your agent, your middleware and your logs, then cryptography doesn't add much. You already trust yourself.
But, it matters when there are multiple parties. An enterprise deploys an agent that can handle customer data. The customer wants proof the agent has followed the rules. The regulator wants proof that the logs were not just edited after an incident. Without cryptographic signatures and hash chains, the enterprise can just say "trust us." With them, the proof is independently verifiable.
It's just the difference between "we followed the rules" and "here's a mathematically verifiable proof we followed the rules." For internal use, it's an overkill. For anything with external accountability, that targets the point.
There's no mathematically verifiable proof that anyone followed the rules. There's a cryptographic chain, but it just means "this piece of the stack, at some point, was convinced to process this and recorded that it did this." -- not whether that actually happened, what code was running, etc.
It doesn't tell you anything about what code was running there or whether it was really enforced.
Look, it's cool that this is an area that interests you. But I want you to know that AI agents are sycophantic and will claim your ideas are good and will not necessarily steer you in good directions. I have patents in the area of non-repudiation dating back 25 years and am doing my best to give you good feedback.
Non-repudiation, policy enforcement, audit-readiness, ledgers: these are all good things. As far as I can tell, there's nothing too special about doing this with LLMs, too. The same kinds of code that a bank uses to ensure that its ledger isn't tampered with and that the right software is running in the right places would work for this job -- and it wasn't vibe coded and mostly specified by AI.
You’re correct. The cryptographic chain proves “this middleware has processed this action and has recorded it,” not that the enforcement logic itself was correct or that the code running was what you think it was. Those are both different guarantees and I have been conflating them.
On “nothing too special about doing this with LLMs,” also fair. The primitives (policy enforcement, audit trails, non-repudiation) aren’t new. The bet is that AI agents will need these at a scale and standardization level that does not exist yet, and having it as a composable library matters when every framework (LangChain, CrewAI, Vercel AI SDK) is building agents differently. But the underlying cryptography isn’t novel.
Proving policy controls are in place and that actions were taken is a fairly universal problem.
Cryptography doesn't really do as much to improve it as one would think. Yes, providing evidence of sequence or that stuff happened before a certain time is a helpful tool to have in the toolbox.
The earliest human writings date to about 3000-3500 BCE, and are almost entirely ledgers on clay tablets.
I want to point out a little asymmetry. It's a little rude to generate a bunch of stuff, including writing, using LLMs, and then expect actual humans to interact with it. If it wasn't your time to do and understand and say, why should it be worth others' time to read and respond to it?