Interesting layer to enforce policy at. You're governing what the
agent can do — filesystem, shell, execution. There's a complementary
problem one layer up: governing what the agent can say before output
reaches a user or downstream system.
The failure modes are different. An agent that deletes the wrong file
causes immediate visible damage. An agent that outputs a guaranteed
return, a clinical claim it can't support, or a sycophantic opener
in a regulated context causes liability that surfaces weeks later in
a compliance review.
The audit trail approach you've taken with HMAC on approvals is the
right instinct for the action layer. The same logic applies to the
output layer — you need to prove not just what was blocked, but that
the check happened at all, against a specific versioned policy, at a
specific time.
Good work on the blast radius simulation — that's the kind of
deterministic pre-flight check that makes governance defensible.
Interesting layer to enforce policy at. You're governing what the agent can do — filesystem, shell, execution. There's a complementary problem one layer up: governing what the agent can say before output reaches a user or downstream system.
The failure modes are different. An agent that deletes the wrong file causes immediate visible damage. An agent that outputs a guaranteed return, a clinical claim it can't support, or a sycophantic opener in a regulated context causes liability that surfaces weeks later in a compliance review.
The audit trail approach you've taken with HMAC on approvals is the right instinct for the action layer. The same logic applies to the output layer — you need to prove not just what was blocked, but that the check happened at all, against a specific versioned policy, at a specific time.
Good work on the blast radius simulation — that's the kind of deterministic pre-flight check that makes governance defensible.