Defense API¶
This page documents the extension surface for adding new defenses to ASB.
Base Interface¶
Every defense implements DefenseStrategy from src/agent_security_sandbox/defenses/base.py.
class DefenseStrategy(ABC):
@abstractmethod
def prepare_context(self, goal: str, untrusted_content: str) -> str:
...
@abstractmethod
def should_allow_tool_call(
self,
tool: Tool,
params: dict[str, object],
context: dict[str, object],
) -> tuple[bool, str]:
...
Lifecycle¶
prepare_context()runs once before the agent starts.- The agent proposes tool calls while reasoning.
should_allow_tool_call()runs before each tool execution.- The runner records allow or block decisions in the trajectory and final metrics.
Design Guidance¶
- Use
prepare_context()for prompt-layer defenses such as delimiters, warnings, and goal reminders. - Use
should_allow_tool_call()for policy and integrity checks tied to specific tool invocations. - Return short, reviewer-readable reasons. These reasons surface in results and error analysis.
- Keep configuration serializable so experiment scripts can store and compare runs cleanly.
Minimal Example¶
from agent_security_sandbox.defenses.base import DefenseStrategy
class MyDefense(DefenseStrategy):
def prepare_context(self, goal: str, untrusted_content: str | None = None) -> str:
if not untrusted_content:
return goal
return f"TASK: {goal}\n\nUNTRUSTED CONTENT:\n{untrusted_content}"
def should_allow_tool_call(self, tool, params, context):
goal = str(context.get("goal", ""))
if tool.name == "send_email" and "attacker" in str(params):
return False, f"Blocked by MyDefense: send_email mismatched with goal {goal!r}"
return True, "Allowed"
Registration¶
After adding the new class:
- Import it in
src/agent_security_sandbox/defenses/registry.py - Add a new ID such as
D11 - Add configuration to
config/defenses.yamlif the defense uses runtime parameters - Add tests that cover both benign and attack paths
Testing Checklist¶
pytest tests/ -q
pytest tests/test_defense_registry.py -q
ruff check src/ tests/
mypy src/agent_security_sandbox/
If the defense depends on embeddings or a second model, include at least one mock-based test that exercises the branch without network access.