terminal agent Topic Archive

terminal agent Topic Archive terminal-agent.html 关键词 terminal agent 的长期追踪 RSS，汇总历史命中文献。 zh-CN Sun, 28 Jun 2026 05:24:06 +0000 Tmax: A simple recipe for terminal agents ../papers/arxiv-5b7989f3a4e0.html https://arxiv.org/abs/2606.23321v1#2026-06-23#terminal-agent Tue, 23 Jun 2026 13:10:02 +0800 Terminal-using agents have quickly become the most popular downstream application of language models (LMs). Despite their prevalence, relatively little academic work has examined RL-based training of these models, likely due to difficult benchmarks, a lack of data, and a lack of simple baseline recipes. We present Tmax, the strongest open RL recipe for terminal agents to date, bringing open data recipes closer to the frontier. While simple, our recipe achieves 27\% on Terminal-Bench 2.0 with on… What Makes Interaction Trajectories Effective for Training Terminal Agents? ../papers/arxiv-d30ae188c67b.html https://arxiv.org/abs/2606.03461#2026-06-03#terminal-agent Wed, 03 Jun 2026 14:09:56 +0800 Stronger code agents are commonly assumed to be superior teachers for post-training, yet this assumption remains poorly disentangled from task difficulty, harness design, and student capacity. We investigate this pedagogical link using Terminal-Lego, a scalable pipeline that transforms multi-domain real-world issues into environment-verified agentic tasks. Surprisingly, standalone performance does not dictate teaching efficacy: while Claude Opus 4.6 achieves higher scores on Terminal-Bench 2.0,… A Self-Evolving Framework for Efficient Terminal Agents via Observational Context Compression ../papers/arxiv-cedced42e5cf.html https://arxiv.org/abs/2604.19572v1#2026-04-22#terminal-agent Wed, 22 Apr 2026 11:37:03 +0800 As model capabilities advance, research has increasingly shifted toward long-horizon, multi-turn terminal-centric agentic tasks, where raw environment feedback is often preserved in the interaction history to support future decisions. However, repeatedly retaining such feedback introduces substantial redundancy and causes cumulative token cost to grow quadratically with the number of steps, hindering long-horizon reasoning. Although observation compression can mitigate this issue, the heterogen…