Keyword Tracking

关键词追踪：terminal bench

这个页面会长期追踪你配置里关心的关键词，并把命中的论文按日期沉淀下来。

近期走势

最近一次命中来自 LM：What Makes a Good Terminal-Agent Benchmark Task: A Guideline for Adversarial, Difficult, and Legible Evaluation Design

2026-06-15

2026-06-16

2026-06-17

2026-06-18

2026-06-19

2026-06-20

2026-06-21

2026-06-22

2026-06-23

2026-06-24

2026-06-25

2026-06-26

2026-06-27

2026-06-28

命中明细

按日期回看匹配到这个关键词的论文标题，并保留来源 feed 信息。

2026-05-01

2026-05-01 12:53:56 (Asia/Shanghai)

What Makes a Good Terminal-Agent Benchmark Task: A Guideline for Adversarial, Difficult, and Legible Evaluation Design

查看原始来源

Terminal-agent benchmarks have become a primary signal for measuring the coding and system-administration capabilities of large language models. As the market for evaluation envir…