agent security Topic Archive

agent security Topic Archive agent-security.html 关键词 agent security 的长期追踪 RSS，汇总历史命中文献。 zh-CN Sun, 28 Jun 2026 05:24:06 +0000 Toward Secure LLM Agents: Threat Surfaces, Attacks, Defenses, and Evaluation ../papers/arxiv-3e30da0f0823.html https://arxiv.org/abs/2606.10749v1#2026-06-10#agent-security Wed, 10 Jun 2026 13:25:04 +0800 Large language model (LLM) agents are rapidly moving from conversational interfaces to software components that plan, invoke tools, maintain memory, and act on external environments. This transition changes the nature of security risk. In agentic settings, failures are no longer limited to unsafe text generation. Untrusted content may redirect control flow, misuse tool privileges, corrupt persistent state, leak sensitive information, or trigger harmful external actions. At the same time, resear… SecureClaw: Clawing Back Control of LLM Agents ../papers/arxiv-e8e56c532b8d.html https://arxiv.org/abs/2606.09549v1#2026-06-09#agent-security Tue, 09 Jun 2026 13:12:49 +0800 Tool-using large language model (LLM) agents face two distinct security failures: unauthorized external actions and exposure of sensitive plaintext inside the runtime before any final output check can intervene. Existing defenses usually protect one boundary, either the planner/runtime or the action sink, and therefore do not by themselves secure both surfaces. We present SecureClaw, a dual-boundary architecture that places authorization at the effect sink and plaintext confinement at the read… SeClaw: Spec-Driven Security Task Synthesis for Evaluating Autonomous Agents ../papers/arxiv-bb5871c96b14.html https://arxiv.org/abs/2606.02302v1#2026-06-02#agent-security Tue, 02 Jun 2026 13:56:35 +0800 Autonomous LLM agents increasingly operate in stateful environments where they access tools, files, memory, and external services. While such capabilities enable complex real-world workflows, they also introduce security risks that are difficult to capture with existing evaluations. Current agent security benchmarks often rely on manually curated tasks, provide limited coverage of emerging threats, and focus primarily on final outcomes rather than the execution processes that lead to unsafe beh…