secure agent Topic Archive

secure agent Topic Archive secure-agent.html 关键词 secure agent 的长期追踪 RSS，汇总历史命中文献。 zh-CN Sun, 28 Jun 2026 05:24:06 +0000 Provably Secure Agent Guardrail ../papers/arxiv-5045007906ff.html https://arxiv.org/abs/2605.29251#2026-05-29#secure-agent Fri, 29 May 2026 13:18:32 +0800 As large language models transition from bounded generative engines to agents with expansive execution privileges, AI going out of control precipitates a fundamental crisis in artificial intelligence security. Existing defense architectures heavily rely on empirical semantic guardrails and probabilistic large model adjudicators, mechanisms that fail to provide deterministic security lower bounds when facing complex semantic symbol decoupling attacks. To overcome this empirical semantic guardrai… From CRUD to Autonomous Agents: Formal Validation and Zero-Trust Security for Semantic Gateways in AI-Native Enterprise Systems ../papers/arxiv-e2b5a83fdb88.html https://arxiv.org/abs/2604.25555v1#2026-04-29#secure-agent Wed, 29 Apr 2026 12:26:28 +0800 Enterprise software engineering is shifting away from deterministic CRUD/REST architectures toward AI-native systems where large language models act as cognitive orchestrators. This transition introduces a critical security tension: probabilistic LLMs weaken classical mechanisms for validation, access control, and formal testing. This paper proposes the design, formal validation, and empirical evaluation of a Semantic Gateway governed by the Model Context Protocol (MCP). The gateway reframes th… ClawGuard: A Runtime Security Framework for Tool-Augmented LLM Agents Against Indirect Prompt Injection ../papers/arxiv-c894eb6a7f68.html https://arxiv.org/abs/2604.11790v1#2026-04-14#secure-agent Tue, 14 Apr 2026 11:37:06 +0800 Tool-augmented Large Language Model (LLM) agents have demonstrated impressive capabilities in automating complex, multi-step real-world tasks, yet remain vulnerable to indirect prompt injection. Adversaries exploit this weakness by embedding malicious instructions within tool-returned content, which agents directly incorporate into their conversation history as trusted observations. This vulnerability manifests across three primary attack channels: web and local content injection, MCP server in…