great project. think my agent will need it. but then one thing i notice is that this only catches single tool calls. most of the time the malicious behavior is a sequence where each call looks fine on its own: read a file, read another, then a curl to somewhere benign-sounding. individually each one scores low. the arc is the dangerous part and per-call scoring kinda misses that.
great project. think my agent will need it. but then one thing i notice is that this only catches single tool calls. most of the time the malicious behavior is a sequence where each call looks fine on its own: read a file, read another, then a curl to somewhere benign-sounding. individually each one scores low. the arc is the dangerous part and per-call scoring kinda misses that.
With the release of opus 4.7 i've been more and more concerned about ai agents, il'' take a look
Looks really interesting!
i don't know, are the layer deterministic or probabilistic?
Hi! they are deterministic
[dead]