Feed Subscription

Agent Runtime Security 固定订阅页

适合长期跟踪单个研究方向。页面会汇总这个 feed 的最近 7 天 / 30 天表现,并保留每天命中的原始条目和 digest 链接。

最近 7 天

16

篇论文

4 个活跃 digest

最近 30 天

74

篇论文

16 个活跃 digest

全部历史

119

篇论文

28 个活跃 digest

近期走势

Agent Runtime Security 今日没有新的命中文献。

2026-06-15
0
2026-06-16
5
2026-06-17
3
2026-06-18
1
2026-06-19
4
2026-06-20
0
2026-06-21
0
2026-06-22
0
2026-06-23
3
2026-06-24
6
2026-06-25
3
2026-06-26
4
2026-06-27
0
2026-06-28
0

相关关键词页

如果这个 feed 同时命中了你配置里的关键词,这里会给出长期追踪入口。

历史命中

按天回看这个 feed 的命中文献,并保留当日 digest 的 Markdown / JSON 原始产物。

2026-06-26

命中 4 篇生成于 2026-06-26 13:16:53 (Asia/Shanghai)
Agent Runtime Security4 篇

《Jailbreaking for the Average Jane: Choosing Optimal Jailbreaks via Bandit Algorithms for Automatically Enhanced Queries》〔评测 / 应用 / 方法〕:With a profusion of jailbreaks for LLMs now widely known, a growing concern is that…

  1. Jailbreaking for the Average Jane: Choosing Optimal Jailbreaks via Bandit Algorithms for Automatically Enhanced Queries · Score 64
    title matched "jailbreak";has PDF;has rich summary
  2. Do Safety Guardrails Need to Reason? LeanGuard: A Fast and Light Approach for Robust Moderation · Score 59
    title matched "guardrail";has PDF;has rich summary
  3. AgentX: Towards Agent-Driven Self-Iteration of Industrial Recommender Systems · Score 41
    summary matched "guardrail";has PDF;has rich summary
  4. MIRROR: Novelty-Constrained Memory-Guided MCTS Red-Teaming for Agentic RAG · Score 40
    summary matched "prompt injection";has PDF;has rich summary

2026-06-25

命中 3 篇生成于 2026-06-25 13:11:21 (Asia/Shanghai)
Agent Runtime Security3 篇

《How Reliable Is Your Jailbreak Judge? Calibration and Adversarial Robustness of Automated ASR Scoring》〔方法〕:Almost every paper on LLM jailbreaks and prompt injection reports an attack-success rate (ASR), and that number…

  1. How Reliable Is Your Jailbreak Judge? Calibration and Adversarial Robustness of Automated ASR Scoring · Score 78
    title matched "jailbreak";summary matched "prompt injection";has PDF
  2. The Unfireable Safety Kernel: Execution-Time AI Alignment for AI Agents and Other Escapable AI Systems · Score 48
    summary matched "guardrail";has PDF;has rich summary
  3. AI Snitches Get Glitches: Towards Evading Agentic Surveillance · Score 44
    summary matched "prompt injection";has PDF;has rich summary

2026-06-24

命中 6 篇生成于 2026-06-24 13:06:49 (Asia/Shanghai)
Agent Runtime Security6 篇

《Burnyard: Future of Malware Analysis》〔方法〕:Malware analysis is a critical aspect of modern cybersecurity. The prevailing industry practice, sandboxing, involves executing suspicious binaries within isol…;《LLMs Prompted…

  1. Burnyard: Future of Malware Analysis · Score 47
    summary matched "sandboxing";has PDF;has rich summary
  2. LLMs Prompted for Legal Context Object More: Overrefusal from Small On-Premises LLMs in Criminal Legal Context · Score 44
    summary matched "jailbreak";has PDF;has rich summary
  3. Red-Teaming the Agentic Red-Team · Score 43
    summary matched "guardrail";has PDF;has rich summary
  4. PHANTOM: A Large-Scale Dataset of Multimodal Adversarial Attacks for Vision-Language Models · Score 41
    summary matched "guardrail";has PDF;has rich summary
  5. Securing LLM-Agent Long-Term Memory Against Poisoning: Non-Malleable, Origin-Bound Authority with Machine-Checked Guarantees · Score 39
    summary matched "data exfiltration";has PDF;has rich summary

2026-06-23

命中 3 篇生成于 2026-06-23 13:10:02 (Asia/Shanghai)
Agent Runtime Security3 篇

《Capable but Careless: Do Computer-Use Agents Follow Contextual Integrity?》〔评测 / 应用 / 方法〕:Computer-use agents (CUAs) now act on a user's behalf across personal applications such as email, calendars, and to-do lists. Thi…

  1. Capable but Careless: Do Computer-Use Agents Follow Contextual Integrity? · Score 64
    title matched "computer-use agent";has PDF;has rich summary
  2. TROPT: An Open Framework for Unifying and Advancing Discrete Text Optimization · Score 46
    summary matched "jailbreak";has PDF;has rich summary
  3. GIF: Locally Sound Geometric Information Flow Control for LLMs · Score 43
    summary matched "prompt injection";has PDF;has rich summary

2026-06-19

命中 4 篇生成于 2026-06-19 14:26:15 (Asia/Shanghai)
Agent Runtime Security4 篇

《What Do Safety-Aligned LLMs Learn From Mixed Compliance Demonstrations?》〔方法〕:Prior work has shown that in-context demonstrations can jailbreak language models, but it remains unclear how models interpret different type…

  1. What Do Safety-Aligned LLMs Learn From Mixed Compliance Demonstrations? · Score 46
    summary matched "jailbreak";has PDF;has rich summary
  2. Analyzing Defensive Misdirection Against Model-Guided Automated Attacks on Agentic AI Systems · Score 46
    summary matched "jailbreak";has PDF;has rich summary
  3. RACL: Reasoning-Agent Control Layers for Continuous Metaheuristic Learning · Score 41
    summary matched "guardrail";has PDF;has rich summary
  4. Beyond Static Endpoints: Tool Programs as an Interface for Flexible Agentic Web Services · Score 39
    summary matched "sandboxing";has PDF;has rich summary

2026-06-18

命中 1 篇生成于 2026-06-18 14:03:08 (Asia/Shanghai)
Agent Runtime Security1 篇

《CodeSentinel: A Three-Layer Defense Against Indirect Prompt Injection in Code Contexts》〔方法〕:Code large language models increasingly retrieve external code context from repositories, documentation, issue threads, and co…

  1. CodeSentinel: A Three-Layer Defense Against Indirect Prompt Injection in Code Contexts · Score 108
    title matched "prompt injection";title matched "indirect prompt injection";has PDF

2026-06-17

命中 3 篇生成于 2026-06-17 14:22:19 (Asia/Shanghai)
Agent Runtime Security3 篇

《Seeing Is Not Screening: Multimodal Hidden Instruction Attacks on Agent Skill Scanners》〔应用 / 方法〕:Agent skills are emerging as an important attack surface in LLM-based systems. Through an empirical study of existing ski…

  1. Seeing Is Not Screening: Multimodal Hidden Instruction Attacks on Agent Skill Scanners · Score 47
    summary matched "privilege escalation";has PDF;has rich summary
  2. A Red-Team Study of Anthropic Fable 5 & Opus 4.8 Models · Score 47
    summary matched "jailbreak";has PDF;has rich summary
  3. PreAct: Computer-Using Agents that Get Faster on Repeated Tasks · Score 43
    summary matched "guardrail";has PDF;has rich summary

2026-06-16

命中 5 篇生成于 2026-06-16 14:38:43 (Asia/Shanghai)
Agent Runtime Security5 篇

《Automated jailbreak attack targeting multiple defense strategies》〔评测 / 方法〕:Large language models (LLMs) have demonstrated remarkable capabilities across a wide range of tasks. However, their safety remains a critical c…

  1. Automated jailbreak attack targeting multiple defense strategies · Score 65
    title matched "jailbreak";has PDF;has rich summary
  2. MyPCBench: A Benchmark for Personally Intelligent Computer-Use Agents · Score 65
    title matched "computer-use agent";has PDF;has rich summary
  3. DoubtProbe: Black-Box Jailbreak Defense via Structural Verification and Semantic Auditing · Score 61
    title matched "jailbreak";has PDF;has rich summary
  4. KVEraser: Learning to Steer KV Cache for Efficient Localized Context Erasing · Score 47
    summary matched "prompt injection";has PDF;has rich summary
  5. Adaptive and Explicit safe: Triggering Latent Safety Awareness in Large Reasoning Models · Score 44
    summary matched "jailbreak";has PDF;has rich summary

2026-06-12

命中 5 篇生成于 2026-06-12 13:55:02 (Asia/Shanghai)
Agent Runtime Security5 篇

《Neuro-Symbolic Agents for Regulated Process Automation: Challenges and Research Agenda》〔应用 / 方法〕:LLM-based agents are entering regulated industries where they automate judgment intensive quality management processes. W…

  1. Neuro-Symbolic Agents for Regulated Process Automation: Challenges and Research Agenda · Score 44
    summary matched "guardrail";has PDF;has rich summary
  2. ComAct: Reframing Professional Software Manipulation via COM-as-Action Paradigm · Score 41
    summary matched "computer-use agent";has PDF;has rich summary
  3. Getting Better at Working With You: Compiling User Corrections into Runtime Enforcement for Coding Agents · Score 40
    summary matched "agent runtime";has PDF;has rich summary
  4. No Hidden Prompts Needed! You Can Game AI Peer Review with Presentation-Only Revisions · Score 38
    summary matched "prompt injection";has PDF;has rich summary
  5. Nous: An Attempt to Extract and Inject the Cognition Behind Prediction-Market Behavior · Score 38
    summary matched "prompt injection";has PDF;has rich summary

2026-06-11

命中 4 篇生成于 2026-06-11 13:59:12 (Asia/Shanghai)
Agent Runtime Security4 篇

《Grammar-Constrained Decoding Can Jailbreak LLMs into Generating Malicious Code》〔评测 / 方法〕:Large Language Models (LLMs) are increasingly used for code generation, raising concerns that they may be misused to produce mali…

  1. Grammar-Constrained Decoding Can Jailbreak LLMs into Generating Malicious Code · Score 60
    title matched "jailbreak";has PDF;has rich summary
  2. OCELOT: Inference-Leakage Budgets for Privacy-Preserving LLM Agents · Score 47
    summary matched "jailbreak";has PDF;has rich summary
  3. Online Shift Detection and Conformal Adaptation for Deployed Safety Classifiers · Score 41
    summary matched "jailbreak";has PDF;has rich summary
  4. External Experience Serving in Production LLM Systems: A Deployment-Oriented Study of Quality-Cost Trade-offs · Score 38
    summary matched "prompt injection";has PDF;has rich summary

2026-06-10

命中 7 篇生成于 2026-06-10 13:25:04 (Asia/Shanghai)
Agent Runtime Security7 篇

《Toward Secure LLM Agents: Threat Surfaces, Attacks, Defenses, and Evaluation》〔评测 / 应用 / 方法〕:Large language model (LLM) agents are rapidly moving from conversational interfaces to software components that plan, invoke t…

  1. Toward Secure LLM Agents: Threat Surfaces, Attacks, Defenses, and Evaluation · Score 78
    summary matched "agent security";summary matched "LLM agent security";summary matched "prompt injection"
  2. Workflow-GYM: Towards Long-Horizon Evaluation of Computer-use Agentic tasks in Real-World Professional Fields · Score 68
    title matched "computer-use agent";has PDF;has rich summary
  3. Data Journalist Agent: Transforming Data into Verifiable Multimodal Stories · Score 48
    summary matched "computer-use agent";has PDF;has rich summary
  4. It Takes One to Bias Them All: Breaking Bad with One-Shot GRPO · Score 45
    summary matched "guardrail";has PDF;has rich summary
  5. Training LLMs to Enforce Multi-Level Instruction Hierarchies via Gravity-Weighted Direct Preference Optimization · Score 44
    summary matched "prompt injection";has PDF;has rich summary

2026-06-09

命中 4 篇生成于 2026-06-09 13:12:49 (Asia/Shanghai)
Agent Runtime Security4 篇

《WeaveBench: A Long-Horizon, Real-World Benchmark for Computer-Use Agents with Hybrid Interfaces》〔评测 / 方法〕:Computer-use agents (CUAs) increasingly operate in runtimes that combine visual desktop control, command-line ex…

  1. WeaveBench: A Long-Horizon, Real-World Benchmark for Computer-Use Agents with Hybrid Interfaces · Score 83
    title matched "computer-use agent";summary matched "agent runtime";has PDF
  2. Brain-Prompt Injection: A Route-Safety Audit for BCI-LLM Agents · Score 63
    title matched "prompt injection";has PDF;has rich summary
  3. What the Eyes See, the LLMs Miss: Exploiting Human Perception for Adversarial Text Attacks · Score 47
    summary matched "guardrail";has PDF;has rich summary
  4. PRISM: Recovering Instruction Sets from Language Model Activations · Score 45
    summary matched "prompt injection";has PDF;has rich summary

2026-06-05

命中 5 篇生成于 2026-06-05 13:25:00 (Asia/Shanghai)
Agent Runtime Security5 篇

《GuardNet: Ensemble Strategies of Shallow Neural Networks for Robust Prompt Injection and Jailbreak Detection》〔评测 / 数据 / 应用 / 方法〕:Large Language Models (LLMs) have transformed natural language processing, but they remai…

  1. GuardNet: Ensemble Strategies of Shallow Neural Networks for Robust Prompt Injection and Jailbreak Detection · Score 138
    title matched "prompt injection";title matched "jailbreak";summary matched "guardrail"
  2. From Risk Classification to Action Plan Remediation: A Guardrail Feedback Driven Framework for LLM Agents · Score 80
    title matched "guardrail";has PDF;has rich summary
  3. Safety Paradox: How Enhanced Safety Awareness Leaves LLMs Vulnerable to Posterior Attack · Score 76
    summary matched "jailbreak";summary matched "guardrail";has PDF
  4. Beyond Similarity: Trustworthy Memory Search for Personal AI Agents · Score 58
    summary matched "jailbreak";has PDF;has rich summary
  5. The Granularity Gap: A Multi-Dimensional Longitudinal Audit of Sycophancy in Gemini Models · Score 58
    summary matched "guardrail";has PDF;has rich summary

2026-06-04

命中 6 篇生成于 2026-06-04 14:02:06 (Asia/Shanghai)
Agent Runtime Security6 篇

《MaskForge: Structure-Aware Adaptive Attacks for Jailbreaking Diffusion Large Language Models》〔评测 / 应用 / 方法〕:Diffusion large language models (dLLMs) generate text by iteratively denoising partially masked sequences unde…

  1. MaskForge: Structure-Aware Adaptive Attacks for Jailbreaking Diffusion Large Language Models · Score 79
    title matched "jailbreak";has PDF;has rich summary
  2. What If Prompt Injection Never Left? Exploring Cross-Session Stored Prompt Injection in Agentic Systems · Score 79
    title matched "prompt injection";has PDF;has rich summary
  3. Caught in the Act(ivation): Toward Pre-Output and Multi-Turn Detection of Credential Exfiltration by LLM Agents · Score 75
    summary matched "prompt injection";summary matched "indirect prompt injection";has PDF
  4. AgentJet: A Flexible Swarm Training Framework for Agentic Reinforcement Learning · Score 57
    summary matched "agent runtime";has PDF;has rich summary
  5. From Untrusted Input to Trusted Memory: A Systematic Study of Memory Poisoning Attacks in LLM Agents · Score 57
    summary matched "prompt injection";has PDF;has rich summary

2026-06-03

命中 8 篇生成于 2026-06-03 14:09:56 (Asia/Shanghai)
Agent Runtime Security8 篇

《D-Judge: Disrupting Multi-Turn Jailbreaks using Semantics-Preserving Output Rewriting》〔评测 / 数据 / 方法〕:Multi-turn jailbreak attacks pose a growing threat to large language model (LLM) safety because they exploit feedback…

  1. D-Judge: Disrupting Multi-Turn Jailbreaks using Semantics-Preserving Output Rewriting · Score 79
    title matched "jailbreak";has PDF;has rich summary
  2. MedCUA-Bench: A Screenshot-Only Benchmark for Clinical Computer-Use Agents · Score 79
    title matched "computer-use agent";has PDF;has rich summary
  3. MultiTurnPSB: Evaluating Multi-Turn Jailbreak Attacks an dClassifier-Based Defenses for Medical AI Safety · Score 79
    title matched "jailbreak";has PDF;has rich summary
  4. From Control Boundary to Insurance Claim: Reconstructing AI-Mediated Losses Through the CER Framework · Score 75
    summary matched "prompt injection";summary matched "malicious tool";has PDF
  5. Acceptance-Test-Driven Evaluation Protocols for Business-Centric LLM Systems · Score 57
    summary matched "guardrail";has PDF;has rich summary

2026-06-02

命中 6 篇生成于 2026-06-02 13:56:35 (Asia/Shanghai)
Agent Runtime Security6 篇

《Jailbreaking Multimodal Large Language Models using Multi-Clip Video》〔数据 / 应用 / 方法〕:As multimodal large language models (MLLMs) have advanced to process video inputs, concerns have emerged about their potential for mal…

  1. Jailbreaking Multimodal Large Language Models using Multi-Clip Video · Score 63
    title matched "jailbreak";has PDF;has rich summary
  2. SentGuard: Sentence-Level Streaming Guardrails for Large Language Models · Score 62
    title matched "guardrail";has PDF;has rich summary
  3. AgentRedBench: Dynamic Redteaming and Integration-Aware Defense for LLM Agents over SaaS Integrations · Score 61
    summary matched "prompt injection";summary matched "indirect prompt injection";has PDF
  4. SafeMCP: Proactive Power Regulation for LLM Agent Defense via Environment-Grounded Look-Ahead Reasoning · Score 61
    title matched "agent defense";has PDF;has rich summary
  5. SeClaw: Spec-Driven Security Task Synthesis for Evaluating Autonomous Agents · Score 44
    summary matched "agent security";has PDF;has rich summary

2026-05-29

命中 4 篇生成于 2026-05-29 13:18:32 (Asia/Shanghai)
Agent Runtime Security4 篇

《Provably Secure Agent Guardrail》〔评测 / 应用 / 方法〕:As large language models transition from bounded generative engines to agents with expansive execution privileges, AI going out of control precipitates a funda…;《Robust an…

  1. Provably Secure Agent Guardrail · Score 120
    title matched "secure agent";title matched "guardrail";has PDF
  2. Robust and Efficient Guardrails with Latent Reasoning · Score 80
    title matched "guardrail";has PDF;has rich summary
  3. AgentDoG 1.5: A Lightweight and Scalable Alignment Framework for AI Agent Safety and Security · Score 58
    summary matched "guardrail";has PDF;has rich summary
  4. Beyond Attack Success Rate: Temporal Logit Observability for LLM Safety Failures · Score 58
    summary matched "jailbreak";has PDF;has rich summary

2026-05-28

命中 5 篇生成于 2026-05-28 13:15:52 (Asia/Shanghai)
Agent Runtime Security5 篇

《Learn from Weaknesses: Automated Domain Specialization for Small Computer-Use Agents》〔数据 / 方法〕:Computer-use agents (CUAs) have recently made substantial progress, but deploying a separate large expert for each software…

  1. Learn from Weaknesses: Automated Domain Specialization for Small Computer-Use Agents · Score 70
    title matched "computer-use agent";has PDF;has rich summary
  2. Code as a Weapon: A Consensus-Labeled Prompt Bank for Measuring Coding-Model Compliance with Malicious-Code Requests · Score 47
    summary matched "jailbreak";has PDF;has rich summary
  3. The Ethics of LLM Sandbox and Persona Dynamics · Score 46
    summary matched "guardrail";has PDF;has rich summary
  4. LACUNA: Safe Agents as Recursive Program Holes · Score 46
    summary matched "prompt injection";has PDF;has rich summary
  5. Technical Report: Exploring the Emerging Threats of the Agent Skill Ecosystem · Score 45
    summary matched "data exfiltration";has PDF;has rich summary

2026-05-27

命中 7 篇生成于 2026-05-27 13:23:19 (Asia/Shanghai)
Agent Runtime Security7 篇

《EviACT: An Evidence-to-Action Framework for Agentic Program Repair》〔评测 / 方法〕:LLM-based agents have moved automated program repair (APR) from fixed-context patch generation to interactive repository-level repair. Howeve…

  1. EviACT: An Evidence-to-Action Framework for Agentic Program Repair · Score 122
    summary matched "guardrail";has PDF;has rich summary
  2. Governed Evolution of Agent Runtimes through Executable Operational Cognition · Score 70
    title matched "agent runtime";has PDF;has rich summary
  3. Prompt Injection Detection is Regime-Dependent: A Deployment-Aware Evaluation with Interpretable Structural Signals · Score 65
    title matched "prompt injection";has PDF;has rich summary
  4. BAIT: Boundary-Guided Disclosure Escalation via Self-Conditioned Reasoning · Score 45
    summary matched "jailbreak";has PDF;has rich summary
  5. AlbanianLLMSafety: A Safety Evaluation Dataset for Large Language Models in Albanian · Score 43
    summary matched "guardrail";has PDF;has rich summary

2026-05-26

命中 3 篇生成于 2026-05-26 13:09:24 (Asia/Shanghai)
Agent Runtime Security3 篇

《CUA-Gym: Scaling Verifiable Training Environments and Tasks for Computer-Use Agents》〔评测 / 数据 / 应用 / 方法〕:Reinforcement learning with verifiable rewards (RLVR) has driven breakthroughs in domains such as math, tool-use,…

  1. CUA-Gym: Scaling Verifiable Training Environments and Tasks for Computer-Use Agents · Score 62
    title matched "computer-use agent";has PDF;has rich summary
  2. How Agentic AI Coding Assistants Become the Attacker's Shell · Score 44
    summary matched "prompt injection";has PDF;has rich summary
  3. AgentHijack: Benchmarking Computer Use Agent Robustness to Common Environment Corruptions · Score 41
    summary matched "computer-use agent";has PDF;has rich summary

2026-05-22

命中 3 篇生成于 2026-05-22 13:08:19 (Asia/Shanghai)
Agent Runtime Security3 篇

《DeltaBox: Scaling Stateful AI Agents with Millisecond-Level Sandbox Checkpoint/Rollback》〔评测 / 应用 / 方法〕:LLM-powered AI agents require high-frequency state exploration (e.g., test-time tree search and reinforcement learn…

  1. DeltaBox: Scaling Stateful AI Agents with Millisecond-Level Sandbox Checkpoint/Rollback · Score 48
    summary matched "agent sandbox";has PDF;has rich summary
  2. HarnessAPI: A Skill-First Framework for Unified Streaming APIs and MCP Tools · Score 47
    summary matched "agent runtime";has PDF;has rich summary
  3. Contractual Skills: A GovernSpec Design Framework for Enterprise AI Agents · Score 46
    summary matched "guardrail";has PDF;has rich summary

2026-05-21

命中 1 篇生成于 2026-05-21 13:14:24 (Asia/Shanghai)
Agent Runtime Security1 篇

《Agent JIT Compilation for Latency-Optimizing Web Agent Planning and Scheduling》〔应用 / 方法〕:Computer-use agents (CUA) automate tasks specified with natural language such as "order the cheapest item from Taco Bell" by gene…

  1. Agent JIT Compilation for Latency-Optimizing Web Agent Planning and Scheduling · Score 48
    summary matched "computer-use agent";has PDF;has rich summary

2026-05-20

命中 7 篇生成于 2026-05-20 13:10:58 (Asia/Shanghai)
Agent Runtime Security7 篇

《Attention-Guided Reward for Reinforcement Learning-based Jailbreak against Large Reasoning Models》〔评测 / 方法〕:Large Reasoning Models (LRMs) have demonstrated remarkable capabilities in solving complex problems by generat…

  1. Attention-Guided Reward for Reinforcement Learning-based Jailbreak against Large Reasoning Models · Score 80
    title matched "jailbreak";has PDF;has rich summary
  2. OpenComputer: Verifiable Software Worlds for Computer-Use Agents · Score 80
    title matched "computer-use agent";has PDF;has rich summary
  3. Robotics-Inspired Guardrails for Foundation Models in Socially Sensitive Domains · Score 80
    title matched "guardrail";has PDF;has rich summary
  4. A Methodology for Selecting and Composing Runtime Architecture Patterns for Production LLM Agents · Score 58
    summary matched "agent runtime";has PDF;has rich summary
  5. Formal Skill: Programmable Runtime Skills for Efficient and Accurate LLM Agents · Score 58
    summary matched "policy enforcement";has PDF;has rich summary

2026-05-19

命中 4 篇生成于 2026-05-19 13:08:04 (Asia/Shanghai)
Agent Runtime Security4 篇

《An Empirical Study of Privacy Leakage Chains via Prompt Injection in Black-Box Chatbot Environments》〔方法〕:LLM-based chatbot agents increasingly process user requests by combining natural-language reasoning with external…

  1. An Empirical Study of Privacy Leakage Chains via Prompt Injection in Black-Box Chatbot Environments · Score 98
    title matched "prompt injection";summary matched "indirect prompt injection";summary matched "jailbreak"
  2. Multilingual jailbreaking of LLMs using low-resource languages · Score 82
    title matched "jailbreak";summary matched "guardrail";has PDF
  3. Overeager Coding Agents: Measuring Out-of-Scope Actions on Benign Tasks · Score 68
    summary matched "prompt injection";has PDF;has rich summary
  4. Acoustic Interference: A New Paradigm Weaponizing Acoustic Latent Semantic for Universal Jailbreak against Large Audio Language Models · Score 63
    title matched "jailbreak";has PDF;has rich summary

2026-05-14

命中 2 篇生成于 2026-05-14 12:52:54 (Asia/Shanghai)
Agent Runtime Security2 篇

《Sleeper Channels and Provenance Gates: Persistent Prompt Injection in Always-on Autonomous AI Agents》〔评测 / 应用 / 方法〕:Always-on AI agents (OpenClaw, Hermes Agent) run as a single persistent process under the owner's iden…

  1. Sleeper Channels and Provenance Gates: Persistent Prompt Injection in Always-on Autonomous AI Agents · Score 66
    title matched "prompt injection";has PDF;has rich summary
  2. LLM-Based Persuasion Enables Guardrail Override in Frontier LLMs · Score 63
    title matched "guardrail";has PDF;has rich summary

2026-05-13

命中 2 篇生成于 2026-05-13 12:54:34 (Asia/Shanghai)
Agent Runtime Security2 篇

《Metaphor Is Not All Attention Needs》〔应用 / 方法〕:Large language models are increasingly deployed in safety-critical applications, where their ability to resist harmful instructions is essential. Although post…;《A microser…

  1. Metaphor Is Not All Attention Needs · Score 44
    summary matched "jailbreak";has PDF;has rich summary
  2. A microservices-based endpoint monitoring platform with predictive NLP models for real-time security and hate-speech risk alerting · Score 42
    summary matched "data exfiltration";has PDF;has rich summary

2026-05-12

命中 6 篇生成于 2026-05-12 12:42:08 (Asia/Shanghai)
Agent Runtime Security6 篇

《Break the Brake, Not the Wheel: Untargeted Jailbreak via Entropy Maximization》〔评测 / 方法〕:Recent studies show that gradient-based universal image jailbreaks on vision-language models (VLMs) exhibit little or no cross-mod…

  1. Break the Brake, Not the Wheel: Untargeted Jailbreak via Entropy Maximization · Score 69
    title matched "jailbreak";has PDF;has rich summary
  2. Intrinsic Guardrails: How Semantic Geometry of Personality Interacts with Emergent Misalignment in LLMs · Score 67
    title matched "guardrail";has PDF;has rich summary
  3. Re-Triggering Safeguards within LLMs for Jailbreak Detection · Score 67
    title matched "jailbreak";has PDF;has rich summary
  4. Guaranteed Jailbreaking Defense via Disrupt-and-Rectify Smoothing · Score 67
    title matched "jailbreak";has PDF;has rich summary
  5. RUBEN: Rule-Based Explanations for Retrieval-Augmented LLM Systems · Score 48
    summary matched "prompt injection";has PDF;has rich summary

2026-05-08

命中 1 篇生成于 2026-05-08 14:15:32 (Asia/Shanghai)
Agent Runtime Security1 篇

《Constraining Host-Level Abuse in Self-Hosted Computer-Use Agents via TEE-Backed Isolation》〔评测 / 应用 / 方法〕:Self-hosted computer-use agents (SHCUAs), such as OpenClaw, combine natural-language interaction with direct acce…

  1. Constraining Host-Level Abuse in Self-Hosted Computer-Use Agents via TEE-Backed Isolation · Score 102
    title matched "computer-use agent";summary matched "prompt injection";summary matched "indirect prompt injection"