Keyword Tracking

关键词追踪：coding agent

这个页面会长期追踪你配置里关心的关键词，并把命中的论文按日期沉淀下来。

返回归档首页查看趋势总览最新 JSON 订阅 RSS

近期走势

最近一次命中来自 Terminal and SWE Agents：A Deterministic Control Plane for LLM Coding Agents

2026-06-15

2026-06-16

2026-06-17

2026-06-18

2026-06-19

2026-06-20

2026-06-21

2026-06-22

2026-06-23

2026-06-24

2026-06-25

2026-06-26

2026-06-27

2026-06-28

命中明细

按日期回看匹配到这个关键词的论文标题，并保留来源 feed 信息。

2026-06-26

2026-06-26 13:16:53 (Asia/Shanghai)

Terminal and SWE Agents

A Deterministic Control Plane for LLM Coding Agents

查看原始来源

LLM coding harnesses grant agents broad file and shell access, yet the configuration layer that steers them -- rules files, agent definitions, IDE-specific markdown -- is largely…

Terminal and SWE Agents

NOVA: A Verification-Aware Agent Harness for Architecture Evolution in Industrial Recommender Systems

查看原始来源

Industrial advertising recommender models are continuously improved through architecture evolution. Upgrades such as RankMixer, TokenMixer-Large, and MixFormer show that better st…

Terminal and SWE Agents

Mostly Automatic Translation of Language Interpreters from C to Safe Rust

查看原始来源

Translating C programs to safe Rust is challenging owing to significant differences in typing constraints, ownership, and borrowing rules. Interpreter programs are particularly im…

Terminal and SWE Agents

The Spec Growth Engine: Spec-Anchored, Code-Coupled, Drift-Enforced Architecture for AI-Assisted Software Development

查看原始来源

AI coding agents dramatically accelerate implementation speed but introduce two structural failure modes that existing spec-driven approaches do not fully solve: (1) context explo…

2026-06-24

2026-06-24 13:06:49 (Asia/Shanghai)

Terminal and SWE Agents

NatureBench: Can Coding Agents Match the Published SOTA of Nature-Family Papers?

查看原始来源

We introduce NatureBench, a cross-discipline benchmark of 90 tasks distilled from peer-reviewed Nature-family publications, designed to evaluate whether AI coding agents can move…

Terminal and SWE Agents

Bayesian control for coding agents

查看原始来源

Modern coding agents pair LLM generators with various tools, including cheap diagnostics and expensive verifiers. The tool-use decisions are typically governed by orchestrators th…

Terminal and SWE Agents

Detecting AI Coding Agents in Open Source: A Validated Multi-Method Census of 180 Million Repositories

查看原始来源

Generative AI coding agents are entering the open-source supply chain, yet their diverse and often invisible traces leave their prevalence poorly understood. We introduce a multi-…

2026-06-19

2026-06-19 14:26:15 (Asia/Shanghai)

Terminal and SWE Agents

Probe-and-Refine Tuning of Repository Guidance for Coding Agents

查看原始来源

LLM-based coding agents need higher-level operational knowledge about a repository (which files house which subsystems, how to run the test suite, which workflows have historicall…

Terminal and SWE Agents

N-Version Programming with Coding Agents

查看原始来源

This paper revisits the classical concept on N-version programming in the setting of contemporary AI coding agents. Revisiting the seminal Knight-Leveson experiment, we study whet…

2026-06-18

2026-06-18 14:03:08 (Asia/Shanghai)

Terminal and SWE Agents

Data Intelligence Agents: Interpreting, Modeling, and Querying Enterprise Data via Autonomous Coding Agents

查看原始来源

Production data integration is bottlenecked by repeated, lossy handoffs between data owners, engineers, and analysts who must collaboratively discover, structure, and query enterp…

2026-06-17

2026-06-17 14:22:19 (Asia/Shanghai)

Terminal and SWE Agents

All Smoke, No Alarm: Oracle Signals in Agent-Authored Test Code

查看原始来源

Software practitioners increasingly use AI coding agents that generate test code alongside production code in open source pull requests (PRs). Recent studies report more than 932,…

Terminal and SWE Agents

GameCraft-Bench: Can Agents Build Playable Games End-to-End in a Real Game Engine?

查看原始来源

Game generation is an emerging application of coding agents, requiring models to transform natural-language specifications into playable interactive systems. Unlike traditional co…

Terminal and SWE Agents

Position: Coding Benchmarks Are Misaligned with Agentic Software Engineering

查看原始来源

Coding agents have become a major mode of software engineering, but the benchmarks we use to compare them were designed in a pre-agent era: they collapse model, harness, and envir…

2026-06-16

2026-06-16 14:38:43 (Asia/Shanghai)

Context-Aware RL for Agentic and Multimodal LLMs

查看原始来源

Large language models (LLMs) often fail when answering requires identifying a small but decisive piece of evidence within a long or complex context, such as a single line in a too…

Terminal and SWE Agents

Agent trajectories as programs: fingerprinting and programming coding-agent behavior

查看原始来源

Benchmark scores tell you what an agent got right; they do not tell you how it got there. In this work, we introduce methods for comparing agents procedurally in different context…

Terminal and SWE Agents

Towards LLM Accelerated Rapid Reviews for Software Tool Discovery -- Case for Log Anomaly Detection

查看原始来源

In software engineering research, the primary outcome is frequently a tool. However, for practitioners and academics alike, it is hard to tell which tools are maintained and do th…

2026-06-12

2026-06-12 13:55:02 (Asia/Shanghai)

AgentBeats: Agentifying Agent Assessment for Openness, Standardization, and Reproducibility

查看原始来源

Agent systems are advancing quickly across domains, but their evaluation remains fragmented. Most benchmarks rely on fixed, LLM-centric harnesses that require heavy integration, c…

Agent Runtime Security

Getting Better at Working With You: Compiling User Corrections into Runtime Enforcement for Coding Agents

查看原始来源

Interactive LLM agents are becoming part of daily work, but they do not reliably become easier to work with over time: a correction remembered in one session may still be violated…

Terminal and SWE Agents

Understanding the Rejection of Fixes Generated by Agentic Pull Requests -- Insights from the AIDev Dataset

查看原始来源

AI coding agents are increasingly used to generate pull requests (PRs) that propose code fixes in software projects. From a first exploration of the AIDev dataset, we find that 46…

Terminal and SWE Agents

Recursive Agent Harnesses

查看原始来源

Recursive language models (RLMs) showed that recursion over model calls is an effective strategy for long-context reasoning, and production coding agents have begun to write code…

2026-06-11

2026-06-11 13:59:12 (Asia/Shanghai)

Terminal and SWE Agents

PROJECTMEM: A Local-First, Event-Sourced Memory and Judgment Layer for AI Coding Agents

查看原始来源

AI coding assistants now support a growing share of software work, from quick scripts to production applications. Yet these agents remain largely stateless: each new session re-re…

2026-06-10

2026-06-10 13:25:04 (Asia/Shanghai)

Terminal and SWE Agents

Frontier Coding Agents Use Metaprogramming to Adapt to Unfamiliar Programming Languages

查看原始来源

LLM-based coding agents are usually evaluated in familiar software settings: mainstream languages, common libraries, and public repositories. These benchmarks remain important, bu…

Terminal and SWE Agents

AutoPDE: Reliable Agentic PDE Solving via Explicitly Represented Solver Strategies

查看原始来源

Numerical solvers for partial differential equations (PDEs) are core computational tools in science and engineering. Building reliable PDE solvers requires not only executable cod…

2026-06-09

2026-06-09 13:12:49 (Asia/Shanghai)

Terminal and SWE Agents

SIGA: Self-Evolving Coding-Agent Adapters for Scientific Simulation

查看原始来源

Advanced scientific simulators expose specialized input languages that turn simulation goals into executable configurations, but learning them can cost domain scientists hours to…

2026-06-05

2026-06-05 13:25:00 (Asia/Shanghai)

Terminal and SWE Agents

ADK Arena: Evaluating Agent Development Kits via LLM-as-a-Developer

查看原始来源

The rapid proliferation of Agent Development Kits (ADKs), SDK-level frameworks for building LLM-powered autonomous agents, has outpaced any empirical understanding of how framewor…

Terminal and SWE Agents

Coding with "Enemy": Can Human Developers Detect AI Agent Sabotage?

查看原始来源

AI coding agents are increasingly embedded in real-world software development, collaborating with human developers while gaining broader access to codebases and tools. This create…

Terminal and SWE Agents

Converted, Not Equivalent: Benchmarking Codebase Conversion via Observational Equivalence

查看原始来源

Coding agents increasingly act as codebase-scale collaborators that can assist with codebase conversion, but this progress has exposed a critical weakness: agents often over-trust…

2026-06-04

2026-06-04 14:02:06 (Asia/Shanghai)

Terminal and SWE Agents

Can Generalist Agents Automate Data Curation?

查看原始来源

Curating training data is among the most consequential yet labor-intensive parts of modern AI development: practitioners iteratively propose, implement, evaluate, and revise data…

Terminal and SWE Agents

Trustworthy AI Software Engineers

查看原始来源

With the rapid rise of AI coding agents, the fundamental premise of what it means to be a software engineer is in question. In this vision paper, we examine what it means for an A…

2026-06-03

2026-06-03 14:09:56 (Asia/Shanghai)

Terminal and SWE Agents

Cross-Lingual Token Arbitrage: Optimizing Code Agent Context Windows via Local LLM Preprocessing

查看原始来源

AI-assisted coding agents are bottlenecked by input-token cost. Two pathologies of raw human input drive much of this overhead: tokenization inefficiency for non-English text and…

Terminal and SWE Agents

Handoff Debt: The Rediscovery Cost When Coding Agents Take Over Interrupted Tasks

查看原始来源

Coding-agent benchmarks evaluate whether a single uninterrupted agent can resolve a repository issue. Real software work is messier: tasks are interrupted, reassigned, reviewed, a…

Terminal and SWE Agents

Human-AI Collaboration and the Transformation of Software Engineering Work

查看原始来源

The integration of Generative AI (GenAI) and Agentic AI into software development is reconfiguring software engineering from an activity centered on human authorship of code into…

2026-06-02

2026-06-02 13:56:35 (Asia/Shanghai)

Terminal and SWE Agents

SkillHarm: Lifecycle-Aware Skill-Based Attacks via Automated Construction

查看原始来源

Agent skills occupy a privileged position in the agent workflow, as agents are expected to implicitly follow and execute them, rendering third-party skills a vulnerable attack sur…

2026-05-29

2026-05-29 13:18:32 (Asia/Shanghai)

Terminal and SWE Agents

Physics Is All You Need? A Case Study in Physicist-Supervised AI Development of Scientific Software

查看原始来源

Are AI agents tools, co-authors, or researchers? We present a quantified case study ($N=1$): a physicist supervising an AI coding agent (Claude Code, Sonnet and Opus models) over…

Terminal and SWE Agents

Discovering Cooperative Pipelines: Autoresearch for Sequential Social Dilemmas

查看原始来源

We study two-level autoresearch for cooperation: an outer-loop AI agent autonomously redesigns the inner-loop pipeline of an LLM policy-synthesis system for multi-agent Sequential…

2026-05-22

2026-05-22 13:08:19 (Asia/Shanghai)

Terminal and SWE Agents

"Refactoring Runaway": Understanding and Mitigating Tangled Refactorings in Coding Agents for Issue Resolution

查看原始来源

Recent advances in coding agents have shown remarkable progress in software issue resolution. In practice, real-world issues are typically bug fixes or feature requests in which h…

Terminal and SWE Agents

Why Are Agentic Pull Requests Merged or Rejected? An Empirical Study

查看原始来源

AI coding agents increasingly submit pull requests (Agentic-PRs) to open-source repositories, yet their performance is commonly assessed using merge and rejection outcomes alone.…

2026-05-21

2026-05-21 13:14:24 (Asia/Shanghai)

Insights Generator: Systematic Corpus-Level Trace Diagnostics for LLM Agents

查看原始来源

Diagnosing failures in LLM agents remains largely manual. Practitioners inspect a small subset of execution traces, form ad-hoc hypotheses, and iterate. This process misses patter…

Terminal and SWE Agents

SpecBench: Measuring Reward Hacking in Long-Horizon Coding Agents

查看原始来源

As long-horizon coding agents produce more code than any developer can review, oversight collapses onto a single surface: the automated test suite. Reward hacking naturally arises…

2026-05-20

2026-05-20 13:10:58 (Asia/Shanghai)

Agent Runtime Security

TwinRouterBench: Fast Static and Live Dynamic Evaluation for Realistic Agentic LLM Routing

查看原始来源

LLM routing matters most in long-horizon applications such as coding agents, deep research systems, and computer-use agents, where a single user request triggers many model calls.…

Terminal and SWE Agents

Does Code Cleanliness Affect Coding Agents? A Controlled Minimal-Pair Study

查看原始来源

As autonomous coding agents see rapid adoption, their evaluation has primarily focused on task completion rates holding the target codebase fixed. This leaves a critical question…

Terminal and SWE Agents

PEEK: Context Map as an Orientation Cache for Long-Context LLM Agents

查看原始来源

Large language model (LLM) agents increasingly operate over long and recurring external contexts, like document corpora and code repositories. Across invocations, existing approac…

Terminal and SWE Agents

RoadmapBench: Evaluating Long-Horizon Agentic Software Development Across Version Upgrades

查看原始来源

Coding agents are increasingly deployed in real software development, where a single version iteration requires months of coordinated work across many files. However, most existin…

2026-05-19

2026-05-19 13:08:04 (Asia/Shanghai)

Agent Runtime Security

Overeager Coding Agents: Measuring Out-of-Scope Actions on Benign Tasks

查看原始来源

Coding agents now run autonomously with shell, file, and network privileges. When a user issues a benign request, the agent sometimes does more than asked: it deletes unrelated fi…

Terminal and SWE Agents

Reversa: A Reverse Documentation Engineering Framework for Converting Legacy Software into Operational Specifications for AI Agents

查看原始来源

Legacy systems concentrate business rules, architectural decisions, and operational exceptions that often remain implicit in code, data, configuration, and maintenance practices.…

2026-05-15

2026-05-15 14:57:29 (Asia/Shanghai)

Terminal and SWE Agents

Remember Your Trace: Memory-Guided Long-Horizon Agentic Framework for Consistent and Hierarchical Repository-Level Code Documentation

查看原始来源

Automated code documentation is essential for modern software development, providing the contextual grounding that both human developers and coding agents rely on to navigate larg…

Terminal and SWE Agents

SWE-Chain: Benchmarking Coding Agents on Chained Release-Level Package Upgrades

查看原始来源

Coding agents powered by large language models are increasingly expected to perform realistic software maintenance tasks beyond isolated issue resolution. Existing benchmarks have…

Terminal and SWE Agents

Documentation-Guided Agentic Codebase Migration from C to Rust

查看原始来源

Migrating legacy C repositories to Rust promises stronger memory safety, but existing translators often work at the level of files or functions and miss architectural intent. We p…

Terminal and SWE Agents

Do Androids Dream of Breaking the Game? Systematically Auditing AI Agent Benchmarks with BenchJack

查看原始来源

Agent benchmarks have become the de facto measure of frontier AI competence, guiding model selection, investment, and deployment. However, reward hacking, where agents maximize a…

2026-05-07

2026-05-07 12:38:06 (Asia/Shanghai)

Agentic Vulnerability Reasoning on Windows COM Binaries

查看原始来源

Windows Component Object Model (COM) services run with elevated privileges and are widely accessible to authenticated users, making race conditions in these binaries a critical su…

2026-05-06

2026-05-06 12:37:23 (Asia/Shanghai)

MOSAIC-Bench: Measuring Compositional Vulnerability Induction in Coding Agents

查看原始来源

Coding agents often pass per-prompt safety review yet ship exploitable code when their tasks are decomposed into routine engineering tickets. The challenge is structural: existing…

2026-05-01

2026-05-01 12:53:56 (Asia/Shanghai)

Exploring Interaction Paradigms for LLM Agents in Scientific Visualization

查看原始来源

This paper examines how different types of large language model (LLM) agents perform on scientific visualization (SciVis) tasks, where users generate visualization workflows from…

2026-04-23

2026-04-23 11:42:13 (Asia/Shanghai)

LLM

SWE-chat: Coding Agent Interactions From Real Users in the Wild

查看原始来源

AI coding agents are being adopted at scale, yet we lack empirical evidence on how people actually use them and how much of their output is useful in practice. We present SWE-chat…

2026-04-16

2026-04-16 11:43:00 (Asia/Shanghai)

LLM

Memory Transfer Learning: How Memories are Transferred Across Domains in Coding Agents

查看原始来源

Memory-based self-evolution has emerged as a promising paradigm for coding agents. However, existing approaches typically restrict memory utilization to homogeneous task domains,…

2026-04-15

2026-04-15 11:35:50 (Asia/Shanghai)

LLM

Drawing on Memory: Dual-Trace Encoding Improves Cross-Session Recall in LLM Agents

查看原始来源

LLM agents with persistent memory store information as flat factual records, providing little context for temporal reasoning, change tracking, or cross-session aggregation. Inspir…

2026-04-14

2026-04-14 11:37:06 (Asia/Shanghai)

LLM

From Translation to Superset: Benchmark-Driven Evolution of a Production AI Agent from Rust to Python

查看原始来源

Cross-language migration of large software systems is a persistent engineering challenge, particularly when the source codebase evolves rapidly. We present a methodology for LLM-a…

2026-04-08

2026-04-08 17:10:24 (Asia/Shanghai)

LLM

Gym-Anything: Turn any Software into an Agent Environment

查看原始来源

Computer-use agents hold the promise of assisting in a wide range of digital economic activities. However, current research has largely focused on short-horizon tasks over a limit…