BAIT: Boundary-Guided Disclosure Escalation via Self-Conditioned Reasoning

论文概览

In this work, we propose BAIT (Boundary-Aware Iterative Trap), a three-step jailbreak framework that approaches malicious goals through internal disclosure. BAIT first asks the model to identify the protection boundary,…

规范主键

arxiv:2605.27110

合并来源

arXiv

作者

Xuan Luo，Yue Wang，Geng Tu，Jing Li，Ruifeng Xu

分类

cs.CR, cs.CL

标签

方法

主题词

Language Model / Large Language Model

首次出现

2026-05-27 13:23:19 (UTC+08:00)

个人反馈

把你为什么标记这篇论文、接下来准备怎么处理，直接挂在规范化详情页上。

当前还没有个人反馈，可以先用本地 feedback CLI 补上。

反馈操作

复制规范主键或本地 CLI 命令，把这篇论文快速加入个人反馈状态文件。

行动提醒状态

这里记录这篇论文最近已经触发过哪些 action reason，便于解释为什么今天没有再次提醒。

当前还没有记录过 action 提醒。

来源与外链

优先展示这篇论文在各来源上的规范化入口，再补当前摘要页和 PDF。

arXiv PDF

历史命中

按归档时间回看它在哪些 feed 中出现过，并保留当日 digest 产物入口。

Agent Runtime Security

2026-05-27

2026-05-27 13:23:19 (Asia/Shanghai)

In this work, we propose BAIT (Boundary-Aware Iterative Trap), a three-step jailbreak framework that approaches malicious goals through internal disclosure. BAIT first asks the mo…

Score 45 · summary matched "jailbreak"；has PDF；has rich summary

Markdown JSON 对应 Feed 页

BAIT: Boundary-Guided Disclosure Escalation via Self-Conditioned Reasoning

论文概览

个人反馈

反馈操作

行动提醒状态

来源与外链

历史命中

2026-05-27

相关推荐

StressEval: Failure-Driven Dynamic Benchmarking for Knowledge-Intensive Reasoning in Large Language Models

Medmarks: A Comprehensive Open-Source LLM Benchmark Suite for Medical Tasks

Maistros: A Greek Large Language Model Adapted Through Knowledge Distillation From Large Reasoning Models

Spatiotemporal Hidden-State Dynamics as a Signature of Internal Reasoning in Large Language Models

Misaligned by Reward: Socially Undesirable Preferences in LLMs

FT-RAG: A Fine-grained Retrieval-Augmented Generation Framework for Complex Table Reasoning