Talk is (Not) Cheap: A Taxonomy and Benchmark Coverage Audit for LLM Attacks

论文概览

We introduce a reusable framework for auditing whether LLM attack benchmarks collectively cover the threat surface: a 4$\times$6 Target $\times$ Technique matrix grounded in STRIDE, constructed from a 507-leaf taxonomy…

规范主键

arxiv:2605.15118

合并来源

arXiv

作者

Karthik Raghu Iyer，Yazdan Jamshidi，Nicholas Bray，Alexey A. Shvets

分类

cs.CR, cs.CL

标签

评测 / 数据 / 方法

主题词

Evaluation / LLM

首次出现

2026-05-15 14:57:29 (UTC+08:00)

个人反馈

把你为什么标记这篇论文、接下来准备怎么处理，直接挂在规范化详情页上。

当前还没有个人反馈，可以先用本地 feedback CLI 补上。

反馈操作

复制规范主键或本地 CLI 命令，把这篇论文快速加入个人反馈状态文件。

行动提醒状态

这里记录这篇论文最近已经触发过哪些 action reason，便于解释为什么今天没有再次提醒。

当前还没有记录过 action 提醒。

来源与外链

优先展示这篇论文在各来源上的规范化入口，再补当前摘要页和 PDF。

arXiv PDF

历史命中

按归档时间回看它在哪些 feed 中出现过，并保留当日 digest 产物入口。

LM

2026-05-15

2026-05-15 14:57:29 (Asia/Shanghai)

We introduce a reusable framework for auditing whether LLM attack benchmarks collectively cover the threat surface: a 4$\times$6 Target $\times$ Technique matrix grounded in STRID…

Score 202 · title matched "LLM"；title matched "RAG"；title matched "benchmark"

Markdown JSON 对应 Feed 页

Talk is (Not) Cheap: A Taxonomy and Benchmark Coverage Audit for LLM Attacks

论文概览

个人反馈

反馈操作

行动提醒状态

来源与外链

历史命中

2026-05-15

相关推荐

Probabilistic Agents in Deterministic Audits: Evaluating Multi-Agent Systems for Automated Audits Based on the German IT-Grundschutz

Qiskit QuantumKatas: Adapting Microsoft's Quantum Computing exercises for LLM evaluation

Automated Benchmark Auditing for AI Agents and Large Language Models

CrossView Suite: Harnessing Cross-view Spatial Intelligence of MLLMs with Dataset, Model and Benchmark

Traceable Knowledge Graph Reasoning Enables LLM-Assisted Decision Support for Industrial VOCs in the Steel Industry

LLM-ReSum: A Framework for LLM Reflective Summarization through Self-Evaluation