This addresses a real pain point—runtime guarantees vs probabilistic hopes. A few questions from someone who's dealt with LLM guardrails in production:
1. How does CSL handle the gap between what an LLM intends to do (based on its reasoning) and what constraints allow? For example, if a policy forbids "database modifications" but an agent legitimately needs to write logs—does the DSL let you express intent-aware exceptions, or do you end up with overly broad rules?
2. Z3 constraint solving can be slow at scale. What's your performance profile when policies are deeply nested or involve many symbolic variables? Have you profiled latency on, say, 100+ concurrent agent requests?
The formal verification angle is solid, but I'd be curious whether you've stress-tested the actual bottleneck: not the policy logic itself, but the interaction between agent reasoning and constraint checking when policies need to be permissive enough to be useful.
Great questions! Thats exact trade-offs we're navigating.
Re: Intent-aware exceptions:
CSL uses hierarchical policy composition for this. Example from our banking case study:
CSL:
DOMAIN BankingGuard {
VARIABLES {
action: {"TRANSFER", "WITHDRAW", "DEPOSIT"}
amount: 0..100000
country: {"TR", "US", "EU", "NK"}
is_vip: {"TRUE", "FALSE"}
kyc_level: 0..5
risk_score: 0..1
device_trust: 0..1
}
// Hard boundary: never flexible
STATE_CONSTRAINT no_sanctioned_country {
WHEN country == country
THEN country MUST NOT BE "NK"
}
// Soft boundaries: context-dependent
STATE_CONSTRAINT transfer_limit_non_vip {
WHEN action == "TRANSFER" AND is_vip == "FALSE"
THEN amount <= 1000
}
STATE_CONSTRAINT transfer_limit_vip {
WHEN action == "TRANSFER" AND is_vip == "TRUE"
THEN amount <= 10000
}
// Multi-dimensional guards (amount + device trust)
STATE_CONSTRAINT device_trust_for_medium_transfer {
WHEN action == "TRANSFER" AND amount > 300
THEN device_trust >= 0.7
}
}
Variables like is_vip, risk_score, device_trust are injected at runtime by your application logic, not inferred by the LLM. The LangChain integration looks like:
safe_tools = guard_tools(
tools=[transfer_tool],
guard=guard,
inject={
"is_vip": current_user.tier == "VIP", # From auth
"risk_score": fraud_model.score(context), # From ML model
"device_trust": session.device_score, # From fingerprinting
"country": geoip.lookup(ip)
}
)
So the agent can't "decide" it's VIP or that the device is trusted. Those come from external systems. The policy just enforces the combinations. Your database/logging example: You'd add a purpose variable and carve out:
STATE_CONSTRAINT no_user_table_writes {
WHEN action == "WRITE" AND table == "users"
THEN purpose MUST BE "AUDIT_LOG"
}
If you don't inject enough context, rules become binary (allow/deny). If you inject too much, the policy becomes a replica of your business logic. We're finding the sweet spot is 6-10 context variables that encode the "security-critical dimensions" (user tier, risk, trust, geography).
Re: Z3 performance: Z3 runs at compile-time, not runtime. The workflow is:
Policy compilation (once): Z3 proves logical consistency → generates pure Python functors
Runtime (per request): Functor evaluation only, no symbolic solver
Typical policy (<20 constraints): <1ms per evaluation.
We haven't stress-tested 100+ concurrent yet (Alpha), but since runtime is stateless Python, it should scale horizontally. The bottleneck would be the LangChain overhead, not CSL.
Your concern about permissiveness is spot-on. We're addressing this in Phase 2 (TLA+) by adding temporal logic: instead of "block all DB writes," you can express "allow writes if preceded by read within 5 actions." This gives you state-aware permissions without making rules combinatorially complex.
The current Z3 engine is intentionally conservative. TLA+ will add the flexibility production systems need.
Appreciate the pushback—this is exactly the feedback we need at Alpha stage. If you have a specific use case in mind, I'd love to test CSL against it.
OP here! You can try it out immediately via `pip install csl-core`.
It allows you to verify policies using the CLI without writing any Python code. I'd really appreciate any feedback on the DSL syntax or the verification approach. Thanks!
How does this work with WASM sandboxing for agents?
Should a (formally verified) policy engine run within the same WASM runtime, or should it be enforced by the WASM runtime, or by the VM or Container that the WASM runtime runs within?
How do these userspace policies compare to MAC and DAC implementations like SELinux AVC, AppArmor, Systemd SyscallFilter, and seccomp with containers for example?
Hi westurner, I just spent some time going through the Amla and agentvm threads you linked, fascinating stuff. I see you've been tracking this intersection specifically around resource isolation.
This is the core of our Phase 3 roadmap. To answer where CSL fits in that stack:
We distinguish between Layer 7 (Business Logic) and Layer 3/4 (Resource Isolation).
Re: WASM Integration
You're right to separate them. Our mental model for the future (Q2) looks like this:
2. CSL Engine (Layer 7) -> Handles "Should I transfer $10k?" (Policy)
Ideally, CSL acts as a Host Function Guard—basically intercepting tool calls before they hit actual syscalls. We're currently looking at two paths:
OPTION A: Host-Side Enforcement
The host intercepts tool calls. CSL (compiled to WASM) runs in the host context to verify the payload before execution.
Pros: Guest can't bypass. Cons: Policy updates need host restart.
OPTION B: Component Model
Using `wit` interfaces where the agent imports a policy-guard capability.
Pros: It's part of the contract. Cons: More complex to compose.
Starting with A, migrating to B as Component Model matures makes sense to us.
Re: SELinux/seccomp vs CSL
The example from your Amla work actually illustrates this perfectly. Even with a perfect sandbox, an agent can call `transfer_funds(dest="attacker")` if it has the capability. seccomp can't reason about "attacker" vs "legitimate_user"—it just sees a valid syscall.
- seccomp stops the agent from hacking the kernel
- CSL stops the agent from making bad business decisions
You need both layers for actual safety.
Re: eWASM / Costed Opcodes
This is something we're thinking about for resource metering. Treating gas/budget as policy variables:
WHEN gas_used > 800000
THEN complexity_score <= 50 // throttle expensive ops
It's closer to metered execution than sandboxing, but fits the same formal verification approach.
Current status:
We're Python-only right now (Alpha v0.2). WASM compilation is Q2-Q3. Planning to:
1. Compile CSL to .wasm (no Python runtime needed)
2. Integrate as Wasmtime host function
3. Expose via Component Model interfaces
If you're open to it, I'd love to pick your brain on the Component Model side when we get there. Your syscall isolation work + our semantic policy layer seem pretty complementary.
This addresses a real pain point—runtime guarantees vs probabilistic hopes. A few questions from someone who's dealt with LLM guardrails in production:
1. How does CSL handle the gap between what an LLM intends to do (based on its reasoning) and what constraints allow? For example, if a policy forbids "database modifications" but an agent legitimately needs to write logs—does the DSL let you express intent-aware exceptions, or do you end up with overly broad rules?
2. Z3 constraint solving can be slow at scale. What's your performance profile when policies are deeply nested or involve many symbolic variables? Have you profiled latency on, say, 100+ concurrent agent requests?
The formal verification angle is solid, but I'd be curious whether you've stress-tested the actual bottleneck: not the policy logic itself, but the interaction between agent reasoning and constraint checking when policies need to be permissive enough to be useful.
Great questions! Thats exact trade-offs we're navigating. Re: Intent-aware exceptions: CSL uses hierarchical policy composition for this. Example from our banking case study: CSL:
DOMAIN BankingGuard {
}Variables like is_vip, risk_score, device_trust are injected at runtime by your application logic, not inferred by the LLM. The LangChain integration looks like:
safe_tools = guard_tools( tools=[transfer_tool], guard=guard, inject={ "is_vip": current_user.tier == "VIP", # From auth "risk_score": fraud_model.score(context), # From ML model "device_trust": session.device_score, # From fingerprinting "country": geoip.lookup(ip) } )
So the agent can't "decide" it's VIP or that the device is trusted. Those come from external systems. The policy just enforces the combinations. Your database/logging example: You'd add a purpose variable and carve out:
STATE_CONSTRAINT no_user_table_writes { WHEN action == "WRITE" AND table == "users" THEN purpose MUST BE "AUDIT_LOG" }
If you don't inject enough context, rules become binary (allow/deny). If you inject too much, the policy becomes a replica of your business logic. We're finding the sweet spot is 6-10 context variables that encode the "security-critical dimensions" (user tier, risk, trust, geography).
Re: Z3 performance: Z3 runs at compile-time, not runtime. The workflow is: Policy compilation (once): Z3 proves logical consistency → generates pure Python functors Runtime (per request): Functor evaluation only, no symbolic solver Typical policy (<20 constraints): <1ms per evaluation. We haven't stress-tested 100+ concurrent yet (Alpha), but since runtime is stateless Python, it should scale horizontally. The bottleneck would be the LangChain overhead, not CSL. Your concern about permissiveness is spot-on. We're addressing this in Phase 2 (TLA+) by adding temporal logic: instead of "block all DB writes," you can express "allow writes if preceded by read within 5 actions." This gives you state-aware permissions without making rules combinatorially complex. The current Z3 engine is intentionally conservative. TLA+ will add the flexibility production systems need. Appreciate the pushback—this is exactly the feedback we need at Alpha stage. If you have a specific use case in mind, I'd love to test CSL against it.
OP here! You can try it out immediately via `pip install csl-core`.
It allows you to verify policies using the CLI without writing any Python code. I'd really appreciate any feedback on the DSL syntax or the verification approach. Thanks!
How does this work with WASM sandboxing for agents?
Should a (formally verified) policy engine run within the same WASM runtime, or should it be enforced by the WASM runtime, or by the VM or Container that the WASM runtime runs within?
"Show HN: Amla Sandbox – WASM bash shell sandbox for AI agents" (2026) https://news.ycombinator.com/item?id=46825026 re: eWASM and costed opcodes for agent efficiency
How do these userspace policies compare to MAC and DAC implementations like SELinux AVC, AppArmor, Systemd SyscallFilter, and seccomp with containers for example?
Hi westurner, I just spent some time going through the Amla and agentvm threads you linked, fascinating stuff. I see you've been tracking this intersection specifically around resource isolation.
This is the core of our Phase 3 roadmap. To answer where CSL fits in that stack:
We distinguish between Layer 7 (Business Logic) and Layer 3/4 (Resource Isolation).
Re: WASM Integration You're right to separate them. Our mental model for the future (Q2) looks like this:
1. Wasmtime/Host (Layer 4) -> Handles "Can I access socket?" (Capabilities)
2. CSL Engine (Layer 7) -> Handles "Should I transfer $10k?" (Policy)
Ideally, CSL acts as a Host Function Guard—basically intercepting tool calls before they hit actual syscalls. We're currently looking at two paths:
OPTION A: Host-Side Enforcement
The host intercepts tool calls. CSL (compiled to WASM) runs in the host context to verify the payload before execution.
Pros: Guest can't bypass. Cons: Policy updates need host restart.
OPTION B: Component Model
Using `wit` interfaces where the agent imports a policy-guard capability.
Pros: It's part of the contract. Cons: More complex to compose.
Starting with A, migrating to B as Component Model matures makes sense to us.
Re: SELinux/seccomp vs CSL
The example from your Amla work actually illustrates this perfectly. Even with a perfect sandbox, an agent can call `transfer_funds(dest="attacker")` if it has the capability. seccomp can't reason about "attacker" vs "legitimate_user"—it just sees a valid syscall.
- seccomp stops the agent from hacking the kernel
- CSL stops the agent from making bad business decisions
You need both layers for actual safety.
Re: eWASM / Costed Opcodes
This is something we're thinking about for resource metering. Treating gas/budget as policy variables:
It's closer to metered execution than sandboxing, but fits the same formal verification approach.Current status:
We're Python-only right now (Alpha v0.2). WASM compilation is Q2-Q3. Planning to:
1. Compile CSL to .wasm (no Python runtime needed)
2. Integrate as Wasmtime host function
3. Expose via Component Model interfaces
If you're open to it, I'd love to pick your brain on the Component Model side when we get there. Your syscall isolation work + our semantic policy layer seem pretty complementary.