Keyword Tracking

关键词追踪：prompt injection

这个页面会长期追踪你配置里关心的关键词，并把命中的论文按日期沉淀下来。

近期走势

最近一次命中来自 LM：Prompt Injection in Automated Résumé Screening with Large Language Models: Single and Multi-Injection Settings

2026-06-15

2026-06-16

2026-06-17

2026-06-18

2026-06-19

2026-06-20

2026-06-21

2026-06-22

2026-06-23

2026-06-24

2026-06-25

2026-06-26

2026-06-27

2026-06-28

命中明细

按日期回看匹配到这个关键词的论文标题，并保留来源 feed 信息。

2026-06-26

2026-06-26 13:16:53 (Asia/Shanghai)

Prompt Injection in Automated Résumé Screening with Large Language Models: Single and Multi-Injection Settings

查看原始来源

Large language models (LLMs) are increasingly used to screen and rank job applicants, creating incentives for candidates to strategically manipulate algorithmic hiring systems. We…

Agent Runtime Security

MIRROR: Novelty-Constrained Memory-Guided MCTS Red-Teaming for Agentic RAG

查看原始来源

Multimodal agentic retrieval-augmented generation (RAG) systems expand the attack surface beyond prompt injection to include text poisoning, image injection, direct-query attacks,…

2026-06-25

2026-06-25 13:11:21 (Asia/Shanghai)

Agent Runtime Security

How Reliable Is Your Jailbreak Judge? Calibration and Adversarial Robustness of Automated ASR Scoring

查看原始来源

Almost every paper on LLM jailbreaks and prompt injection reports an attack-success rate (ASR), and that number is assigned not by people but by an automated judge: either a safet…

Agent Runtime Security

AI Snitches Get Glitches: Towards Evading Agentic Surveillance

查看原始来源

To better assist users with completing challenging tasks, AI agents mediate communications, access data, and interact with different APIs. Many employers (and even nation-states)…

2026-06-23

2026-06-23 13:10:02 (Asia/Shanghai)

Agent Runtime Security

GIF: Locally Sound Geometric Information Flow Control for LLMs

查看原始来源

Large language models increasingly mediate interactions between sensitive data, untrusted inputs, and privileged actions in agentic systems, creating security and privacy risks. T…

2026-06-18

2026-06-18 14:03:08 (Asia/Shanghai)

Agent Runtime Security

CodeSentinel: A Three-Layer Defense Against Indirect Prompt Injection in Code Contexts

查看原始来源

Code large language models increasingly retrieve external code context from repositories, documentation, issue threads, and coding-agent environments, creating an indirect prompt-…

2026-06-16

2026-06-16 14:38:43 (Asia/Shanghai)

Agent Runtime Security

KVEraser: Learning to Steer KV Cache for Efficient Localized Context Erasing

查看原始来源

Post-hoc context erasing over the KV cache is challenging because a local edit has a global consequence: once a span has been processed, its influence propagates into the cached s…

2026-06-12

2026-06-12 13:55:02 (Asia/Shanghai)

Who Pays the Price? Stakeholder-Centric Prompt Injection Benchmarking for Real-world Web Agents

查看原始来源

Web agents driven by large language models (LLMs) are increasingly deployed in real-world environments, where they operate over untrusted web content and execute actions with dire…

Agent Runtime Security

No Hidden Prompts Needed! You Can Game AI Peer Review with Presentation-Only Revisions

查看原始来源

As AI-generated reviews move from experimental tools into peer-review infrastructure, most robustness concerns have focused on explicit attacks such as hidden instructions and pro…

Agent Runtime Security

Nous: An Attempt to Extract and Inject the Cognition Behind Prediction-Market Behavior

查看原始来源

As LLM agents proliferate in prediction markets and collective decision-making, they risk a cognitive monoculture: agents built on shared foundation models produce correlated fore…

2026-06-11

2026-06-11 13:59:12 (Asia/Shanghai)

Agent Runtime Security

External Experience Serving in Production LLM Systems: A Deployment-Oriented Study of Quality-Cost Trade-offs

查看原始来源

Production LLM systems accumulate reusable operational experience, but the practical deployment issue is not merely whether such experience can help. It is how different serving s…

2026-06-10

2026-06-10 13:25:04 (Asia/Shanghai)

Agent Runtime Security

Toward Secure LLM Agents: Threat Surfaces, Attacks, Defenses, and Evaluation

查看原始来源

Large language model (LLM) agents are rapidly moving from conversational interfaces to software components that plan, invoke tools, maintain memory, and act on external environmen…

Agent Runtime Security

Training LLMs to Enforce Multi-Level Instruction Hierarchies via Gravity-Weighted Direct Preference Optimization

查看原始来源

Production LLMs receive instructions from sources with very different levels of trust, yet attend to every token with uniform architectural privilege. This is the structural vulne…

2026-06-09

2026-06-09 13:12:49 (Asia/Shanghai)

Agent Runtime Security

Brain-Prompt Injection: A Route-Safety Audit for BCI-LLM Agents

查看原始来源

BCI-to-agent pipelines turn decoded neural activity into an authorization channel for tool-use agents, exposing a new attack surface we call \emph{brain-prompt injection}: signal-…

Agent Runtime Security

PRISM: Recovering Instruction Sets from Language Model Activations

查看原始来源

As LLMs are deployed as agents, reliable monitoring requires knowing not only what they output, but which instructions are steering their behavior. This is difficult when models i…

2026-06-05

2026-06-05 13:25:00 (Asia/Shanghai)

Agent Runtime Security

GuardNet: Ensemble Strategies of Shallow Neural Networks for Robust Prompt Injection and Jailbreak Detection

查看原始来源

Large Language Models (LLMs) have transformed natural language processing, but they remain vulnerable to Prompt Injection (PI) and Jailbreak (JB) attacks. In addition, benchmark e…

2026-06-04

2026-06-04 14:02:06 (Asia/Shanghai)

Agent Runtime Security

What If Prompt Injection Never Left? Exploring Cross-Session Stored Prompt Injection in Agentic Systems

查看原始来源

Modern agentic systems transform LLMs from session-bounded assistants into stateful systems that persist and evolve shared world state across sessions through memories, filesystem…

Agent Runtime Security

Caught in the Act(ivation): Toward Pre-Output and Multi-Turn Detection of Credential Exfiltration by LLM Agents

查看原始来源

LLM agents often place sensitive credentials in the same context window as untrusted retrieved content, creating a direct path for indirect prompt injection to induce credential e…

Agent Runtime Security

From Untrusted Input to Trusted Memory: A Systematic Study of Memory Poisoning Attacks in LLM Agents

查看原始来源

Memory is a core component of AI agents, enabling them to accumulate knowledge across interactions and improve performance. However, persistent memory introduces the risk of memor…

2026-06-03

2026-06-03 14:09:56 (Asia/Shanghai)

Agent Runtime Security

From Control Boundary to Insurance Claim: Reconstructing AI-Mediated Losses Through the CER Framework

查看原始来源

AI losses that arise through an insured organization's generative or agentic AI system require state reconstruction, not merely event reconstruction, because the relevant state ch…

2026-06-02

2026-06-02 13:56:35 (Asia/Shanghai)

Agent Runtime Security

AgentRedBench: Dynamic Redteaming and Integration-Aware Defense for LLM Agents over SaaS Integrations

查看原始来源

Indirect prompt injection in tool-use agents is a concrete production threat: LLM agents read from integrations (third-party services such as Gmail, Salesforce, or Jira accessed t…

2026-05-28

2026-05-28 13:15:52 (Asia/Shanghai)

Agent Runtime Security

LACUNA: Safe Agents as Recursive Program Holes

查看原始来源

LLM agents increasingly act by writing code, yet a split persists between the runtime that drives the agent and the code the model writes. The runtime owns the loop, context, and…

2026-05-27

2026-05-27 13:23:19 (Asia/Shanghai)

Agent Runtime Security

Prompt Injection Detection is Regime-Dependent: A Deployment-Aware Evaluation with Interpretable Structural Signals

查看原始来源

Prompt injection poses a critical threat to the safe deployment of large language models, yet existing detection approaches are typically evaluated under limited settings that do…

2026-05-26

2026-05-26 13:09:24 (Asia/Shanghai)

Agent Runtime Security

How Agentic AI Coding Assistants Become the Attacker's Shell

查看原始来源

Agentic AI coding assistants can edit files, run commands, and access the internet on behalf of developers. However, their reliance on unvetted external artifacts introduces a new…

2026-05-19

2026-05-19 13:08:04 (Asia/Shanghai)

Agent Runtime Security

An Empirical Study of Privacy Leakage Chains via Prompt Injection in Black-Box Chatbot Environments

查看原始来源

LLM-based chatbot agents increasingly process user requests by combining natural-language reasoning with external tools such as web browsing. These capabilities improve usability,…

Agent Runtime Security

Overeager Coding Agents: Measuring Out-of-Scope Actions on Benign Tasks

查看原始来源

Coding agents now run autonomously with shell, file, and network privileges. When a user issues a benign request, the agent sometimes does more than asked: it deletes unrelated fi…

2026-05-14

2026-05-14 12:52:54 (Asia/Shanghai)

Agent Runtime Security

Sleeper Channels and Provenance Gates: Persistent Prompt Injection in Always-on Autonomous AI Agents

查看原始来源

Always-on AI agents (OpenClaw, Hermes Agent) run as a single persistent process under the owner's identity, folding messaging, memory, self-authored skills, scheduling, and shell…

2026-05-12

2026-05-12 12:42:08 (Asia/Shanghai)

Agent Runtime Security

RUBEN: Rule-Based Explanations for Retrieval-Augmented LLM Systems

查看原始来源

This paper demonstrates RUBEN, an interactive tool for discovering minimal rules to explain the outputs of retrieval-augmented large language models (LLMs) in data-driven applicat…

2026-05-08

2026-05-08 14:15:32 (Asia/Shanghai)

Agent Runtime Security

Constraining Host-Level Abuse in Self-Hosted Computer-Use Agents via TEE-Backed Isolation

查看原始来源

Self-hosted computer-use agents (SHCUAs), such as OpenClaw, combine natural-language interaction with direct access to host-side resources, including browsers, files, scripts, sys…

2026-04-14

2026-04-14 11:37:06 (Asia/Shanghai)

LLM

Detecting Safety Violations Across Many Agent Traces

查看原始来源

To identify safety violations, auditors often search over large sets of agent traces. This search is difficult because failures are often rare, complex, and sometimes even adversa…

LLM

ClawGuard: A Runtime Security Framework for Tool-Augmented LLM Agents Against Indirect Prompt Injection

查看原始来源

Tool-augmented Large Language Model (LLM) agents have demonstrated impressive capabilities in automating complex, multi-step real-world tasks, yet remain vulnerable to indirect pr…