最近 7 天
16
篇论文
Feed Subscription
适合长期跟踪单个研究方向。页面会汇总这个 feed 的最近 7 天 / 30 天表现,并保留每天命中的原始条目和 digest 链接。
最近 7 天
16
篇论文
最近 30 天
74
篇论文
全部历史
119
篇论文
Agent Runtime Security 今日没有新的命中文献。
如果这个 feed 同时命中了你配置里的关键词,这里会给出长期追踪入口。
按天回看这个 feed 的命中文献,并保留当日 digest 的 Markdown / JSON 原始产物。
《Jailbreaking for the Average Jane: Choosing Optimal Jailbreaks via Bandit Algorithms for Automatically Enhanced Queries》〔评测 / 应用 / 方法〕:With a profusion of jailbreaks for LLMs now widely known, a growing concern is that…
《How Reliable Is Your Jailbreak Judge? Calibration and Adversarial Robustness of Automated ASR Scoring》〔方法〕:Almost every paper on LLM jailbreaks and prompt injection reports an attack-success rate (ASR), and that number…
《Burnyard: Future of Malware Analysis》〔方法〕:Malware analysis is a critical aspect of modern cybersecurity. The prevailing industry practice, sandboxing, involves executing suspicious binaries within isol…;《LLMs Prompted…
《Capable but Careless: Do Computer-Use Agents Follow Contextual Integrity?》〔评测 / 应用 / 方法〕:Computer-use agents (CUAs) now act on a user's behalf across personal applications such as email, calendars, and to-do lists. Thi…
《What Do Safety-Aligned LLMs Learn From Mixed Compliance Demonstrations?》〔方法〕:Prior work has shown that in-context demonstrations can jailbreak language models, but it remains unclear how models interpret different type…
《CodeSentinel: A Three-Layer Defense Against Indirect Prompt Injection in Code Contexts》〔方法〕:Code large language models increasingly retrieve external code context from repositories, documentation, issue threads, and co…
《Seeing Is Not Screening: Multimodal Hidden Instruction Attacks on Agent Skill Scanners》〔应用 / 方法〕:Agent skills are emerging as an important attack surface in LLM-based systems. Through an empirical study of existing ski…
《Automated jailbreak attack targeting multiple defense strategies》〔评测 / 方法〕:Large language models (LLMs) have demonstrated remarkable capabilities across a wide range of tasks. However, their safety remains a critical c…
《Neuro-Symbolic Agents for Regulated Process Automation: Challenges and Research Agenda》〔应用 / 方法〕:LLM-based agents are entering regulated industries where they automate judgment intensive quality management processes. W…
《Grammar-Constrained Decoding Can Jailbreak LLMs into Generating Malicious Code》〔评测 / 方法〕:Large Language Models (LLMs) are increasingly used for code generation, raising concerns that they may be misused to produce mali…
《Toward Secure LLM Agents: Threat Surfaces, Attacks, Defenses, and Evaluation》〔评测 / 应用 / 方法〕:Large language model (LLM) agents are rapidly moving from conversational interfaces to software components that plan, invoke t…
《WeaveBench: A Long-Horizon, Real-World Benchmark for Computer-Use Agents with Hybrid Interfaces》〔评测 / 方法〕:Computer-use agents (CUAs) increasingly operate in runtimes that combine visual desktop control, command-line ex…
《GuardNet: Ensemble Strategies of Shallow Neural Networks for Robust Prompt Injection and Jailbreak Detection》〔评测 / 数据 / 应用 / 方法〕:Large Language Models (LLMs) have transformed natural language processing, but they remai…
《MaskForge: Structure-Aware Adaptive Attacks for Jailbreaking Diffusion Large Language Models》〔评测 / 应用 / 方法〕:Diffusion large language models (dLLMs) generate text by iteratively denoising partially masked sequences unde…
《D-Judge: Disrupting Multi-Turn Jailbreaks using Semantics-Preserving Output Rewriting》〔评测 / 数据 / 方法〕:Multi-turn jailbreak attacks pose a growing threat to large language model (LLM) safety because they exploit feedback…
《Jailbreaking Multimodal Large Language Models using Multi-Clip Video》〔数据 / 应用 / 方法〕:As multimodal large language models (MLLMs) have advanced to process video inputs, concerns have emerged about their potential for mal…
《Provably Secure Agent Guardrail》〔评测 / 应用 / 方法〕:As large language models transition from bounded generative engines to agents with expansive execution privileges, AI going out of control precipitates a funda…;《Robust an…
《Learn from Weaknesses: Automated Domain Specialization for Small Computer-Use Agents》〔数据 / 方法〕:Computer-use agents (CUAs) have recently made substantial progress, but deploying a separate large expert for each software…
《EviACT: An Evidence-to-Action Framework for Agentic Program Repair》〔评测 / 方法〕:LLM-based agents have moved automated program repair (APR) from fixed-context patch generation to interactive repository-level repair. Howeve…
《CUA-Gym: Scaling Verifiable Training Environments and Tasks for Computer-Use Agents》〔评测 / 数据 / 应用 / 方法〕:Reinforcement learning with verifiable rewards (RLVR) has driven breakthroughs in domains such as math, tool-use,…
《DeltaBox: Scaling Stateful AI Agents with Millisecond-Level Sandbox Checkpoint/Rollback》〔评测 / 应用 / 方法〕:LLM-powered AI agents require high-frequency state exploration (e.g., test-time tree search and reinforcement learn…
《Agent JIT Compilation for Latency-Optimizing Web Agent Planning and Scheduling》〔应用 / 方法〕:Computer-use agents (CUA) automate tasks specified with natural language such as "order the cheapest item from Taco Bell" by gene…
《Attention-Guided Reward for Reinforcement Learning-based Jailbreak against Large Reasoning Models》〔评测 / 方法〕:Large Reasoning Models (LRMs) have demonstrated remarkable capabilities in solving complex problems by generat…
《An Empirical Study of Privacy Leakage Chains via Prompt Injection in Black-Box Chatbot Environments》〔方法〕:LLM-based chatbot agents increasingly process user requests by combining natural-language reasoning with external…
《Sleeper Channels and Provenance Gates: Persistent Prompt Injection in Always-on Autonomous AI Agents》〔评测 / 应用 / 方法〕:Always-on AI agents (OpenClaw, Hermes Agent) run as a single persistent process under the owner's iden…
《Metaphor Is Not All Attention Needs》〔应用 / 方法〕:Large language models are increasingly deployed in safety-critical applications, where their ability to resist harmful instructions is essential. Although post…;《A microser…
《Break the Brake, Not the Wheel: Untargeted Jailbreak via Entropy Maximization》〔评测 / 方法〕:Recent studies show that gradient-based universal image jailbreaks on vision-language models (VLMs) exhibit little or no cross-mod…
《Constraining Host-Level Abuse in Self-Hosted Computer-Use Agents via TEE-Backed Isolation》〔评测 / 应用 / 方法〕:Self-hosted computer-use agents (SHCUAs), such as OpenClaw, combine natural-language interaction with direct acce…