最近 7 天
15
篇论文
Feed Subscription
适合长期跟踪单个研究方向。页面会汇总这个 feed 的最近 7 天 / 30 天表现,并保留每天命中的原始条目和 digest 链接。
最近 7 天
15
篇论文
最近 30 天
64
篇论文
全部历史
85
篇论文
Terminal and SWE Agents 今日没有新的命中文献。
如果这个 feed 同时命中了你配置里的关键词,这里会给出长期追踪入口。
按天回看这个 feed 的命中文献,并保留当日 digest 的 Markdown / JSON 原始产物。
《Smaller Models, Unexpected Costs: Trade-offs in LLM Quantization for Automated Program Repair》〔评测 / 方法〕:Language Models (LLMs) are powerful toolsand have been increasingly adopted for complex software engineering tasks…
《Unlocking Model Potentials Through Adaptive Multi-Agent Scaffolding for Efficient Issue Resolution》〔评测 / 应用 / 方法〕:Resolving issues with ambiguous and incomplete descriptions, particularly concerning complex bugs, requi…
《SHERLOC: Structured Diagnostic Localization for Code Repair Agents》〔方法〕:LLM agents solve repository-level coding tasks through multi-turn tool use, but utilize half their budget on locating faults before editing. Dedic…
《Tmax: A simple recipe for terminal agents》〔评测 / 数据 / 应用 / 方法〕:Terminal-using agents have quickly become the most popular downstream application of language models (LMs). Despite their prevalence, relatively little acad…
《Probe-and-Refine Tuning of Repository Guidance for Coding Agents》〔应用 / 方法〕:LLM-based coding agents need higher-level operational knowledge about a repository (which files house which subsystems, how to run the test sui…
《Data Intelligence Agents: Interpreting, Modeling, and Querying Enterprise Data via Autonomous Coding Agents》〔评测 / 应用 / 方法〕:Production data integration is bottlenecked by repeated, lossy handoffs between data owners, en…
《All Smoke, No Alarm: Oracle Signals in Agent-Authored Test Code》〔方法〕:Software practitioners increasingly use AI coding agents that generate test code alongside production code in open source pull requests (PRs). Recent…
《Agent trajectories as programs: fingerprinting and programming coding-agent behavior》〔评测 / 数据 / 应用 / 方法〕:Benchmark scores tell you what an agent got right; they do not tell you how it got there. In this work, we introd…
《Understanding the Rejection of Fixes Generated by Agentic Pull Requests -- Insights from the AIDev Dataset》〔数据 / 方法〕:AI coding agents are increasingly used to generate pull requests (PRs) that propose code fixes in sof…
《PROJECTMEM: A Local-First, Event-Sourced Memory and Judgment Layer for AI Coding Agents》〔应用 / 方法〕:AI coding assistants now support a growing share of software work, from quick scripts to production applications. Yet th…
《Frontier Coding Agents Use Metaprogramming to Adapt to Unfamiliar Programming Languages》〔评测 / 方法〕:LLM-based coding agents are usually evaluated in familiar software settings: mainstream languages, common libraries, and…
《SIGA: Self-Evolving Coding-Agent Adapters for Scientific Simulation》〔方法〕:Advanced scientific simulators expose specialized input languages that turn simulation goals into executable configurations, but learning them ca…
《ADK Arena: Evaluating Agent Development Kits via LLM-as-a-Developer》〔评测 / 方法〕:The rapid proliferation of Agent Development Kits (ADKs), SDK-level frameworks for building LLM-powered autonomous agents, has outpaced any…
《Latent Anchor-Driven Test Generation for Deep Neural Networks》〔数据 / 应用 / 方法〕:Deep Neural Networks (DNNs) are increasingly being deployed in security-critical and safety-sensitive applications, which makes rigorous test…
《What Makes Interaction Trajectories Effective for Training Terminal Agents?》〔方法〕:Stronger code agents are commonly assumed to be superior teachers for post-training, yet this assumption remains poorly disentangled from…
《SkillHarm: Lifecycle-Aware Skill-Based Attacks via Automated Construction》〔评测 / 应用 / 方法〕:Agent skills occupy a privileged position in the agent workflow, as agents are expected to implicitly follow and execute them, re…
《Physics Is All You Need? A Case Study in Physicist-Supervised AI Development of Scientific Software》〔应用 / 方法〕:Are AI agents tools, co-authors, or researchers? We present a quantified case study ($N=1$): a physicist sup…
《Calibrating Conservatism for Scalable Oversight》〔方法〕:Agentic AI systems capable of autonomous planning and extended environmental interaction pose a fundamental control problem: how can humans maintain meaningful…
《"Refactoring Runaway": Understanding and Mitigating Tangled Refactorings in Coding Agents for Issue Resolution》〔方法〕:Recent advances in coding agents have shown remarkable progress in software issue resolution. In pract…
《SpecBench: Measuring Reward Hacking in Long-Horizon Coding Agents》〔评测 / 方法〕:As long-horizon coding agents produce more code than any developer can review, oversight collapses onto a single surface: the automated test s…
《Does Code Cleanliness Affect Coding Agents? A Controlled Minimal-Pair Study》〔评测 / 应用 / 方法〕:As autonomous coding agents see rapid adoption, their evaluation has primarily focused on task completion rates holding the tar…
《Same Signal, Different Semantics: A Cross-Framework Behavioral Analysis of Software Engineering Agents》〔应用 / 方法〕:Behavioral studies of LLM-based software engineering agents extract operational rules about which traject…
《CRANE: Constrained Reasoning Injection for Code Agents via Nullspace Editing》〔评测 / 方法〕:Code agents must both reason over long-horizon repository state and obey strict tool-use protocols. In paired Instruct/Thinking che…