Indexing what the agent does is the easy half. The harder half is measuring what the human stops doing once the agent is reliable enough. The automation bias work going back to Parasuraman in 2010 is pretty clear that operators paired with capable AI degrade their own judgment within months, not years. Most agent-safety docs treat the human like a fixed observer. They aren't.
Indexing what the agent does is the easy half. The harder half is measuring what the human stops doing once the agent is reliable enough. The automation bias work going back to Parasuraman in 2010 is pretty clear that operators paired with capable AI degrade their own judgment within months, not years. Most agent-safety docs treat the human like a fixed observer. They aren't.