Feed Subscription

Terminal and SWE Agents 固定订阅页

适合长期跟踪单个研究方向。页面会汇总这个 feed 的最近 7 天 / 30 天表现,并保留每天命中的原始条目和 digest 链接。

最近 7 天

15

篇论文

4 个活跃 digest

最近 30 天

64

篇论文

16 个活跃 digest

全部历史

85

篇论文

23 个活跃 digest

近期走势

Terminal and SWE Agents 今日没有新的命中文献。

2026-06-15
0
2026-06-16
3
2026-06-17
5
2026-06-18
1
2026-06-19
3
2026-06-20
0
2026-06-21
0
2026-06-22
0
2026-06-23
1
2026-06-24
5
2026-06-25
2
2026-06-26
7
2026-06-27
0
2026-06-28
0

相关关键词页

如果这个 feed 同时命中了你配置里的关键词,这里会给出长期追踪入口。

历史命中

按天回看这个 feed 的命中文献,并保留当日 digest 的 Markdown / JSON 原始产物。

2026-06-26

命中 7 篇生成于 2026-06-26 13:16:53 (Asia/Shanghai)
Terminal and SWE Agents7 篇

《Smaller Models, Unexpected Costs: Trade-offs in LLM Quantization for Automated Program Repair》〔评测 / 方法〕:Language Models (LLMs) are powerful toolsand have been increasingly adopted for complex software engineering tasks…

  1. Smaller Models, Unexpected Costs: Trade-offs in LLM Quantization for Automated Program Repair · Score 108
    title matched "program repair";title matched "automated program repair";has PDF
  2. To Run or Not to Run: Analyzing the Cost-Effectiveness of Code Execution in LLM-Based Program Repair · Score 83
    title matched "program repair";summary matched "SWE-bench";has PDF
  3. How Much Static Structure Do Code Agents Need? A Study of Deterministic Anchoring · Score 65
    title matched "code agent";has PDF;has rich summary
  4. A Deterministic Control Plane for LLM Coding Agents · Score 64
    title matched "coding agent";has PDF;has rich summary
  5. NOVA: A Verification-Aware Agent Harness for Architecture Evolution in Industrial Recommender Systems · Score 47
    summary matched "coding agent";has PDF;has rich summary

2026-06-25

命中 2 篇生成于 2026-06-25 13:11:21 (Asia/Shanghai)
Terminal and SWE Agents2 篇

《Unlocking Model Potentials Through Adaptive Multi-Agent Scaffolding for Efficient Issue Resolution》〔评测 / 应用 / 方法〕:Resolving issues with ambiguous and incomplete descriptions, particularly concerning complex bugs, requi…

  1. Unlocking Model Potentials Through Adaptive Multi-Agent Scaffolding for Efficient Issue Resolution · Score 78
    title matched "issue resolution";summary matched "SWE-bench";has PDF
  2. Evaluating LLMs on Real-World Software Performance Optimization · Score 38
    summary matched "repository-level";has PDF;has rich summary

2026-06-24

命中 5 篇生成于 2026-06-24 13:06:49 (Asia/Shanghai)
Terminal and SWE Agents5 篇

《SHERLOC: Structured Diagnostic Localization for Code Repair Agents》〔方法〕:LLM agents solve repository-level coding tasks through multi-turn tool use, but utilize half their budget on locating faults before editing. Dedic…

  1. SHERLOC: Structured Diagnostic Localization for Code Repair Agents · Score 105
    title matched "code repair";summary matched "SWE-bench";summary matched "repository-level"
  2. NatureBench: Can Coding Agents Match the Published SOTA of Nature-Family Papers? · Score 65
    title matched "coding agent";has PDF;has rich summary
  3. Bayesian control for coding agents · Score 64
    title matched "coding agent";has PDF;has rich summary
  4. Detecting AI Coding Agents in Open Source: A Validated Multi-Method Census of 180 Million Repositories · Score 63
    title matched "coding agent";has PDF;has rich summary
  5. LemonHarness Technical Report · Score 39
    summary matched "Terminal-Bench";has PDF;has rich summary

2026-06-23

命中 1 篇生成于 2026-06-23 13:10:02 (Asia/Shanghai)
Terminal and SWE Agents1 篇

《Tmax: A simple recipe for terminal agents》〔评测 / 数据 / 应用 / 方法〕:Terminal-using agents have quickly become the most popular downstream application of language models (LMs). Despite their prevalence, relatively little acad…

  1. Tmax: A simple recipe for terminal agents · Score 84
    title matched "terminal agent";summary matched "Terminal-Bench";has PDF

2026-06-19

命中 3 篇生成于 2026-06-19 14:26:15 (Asia/Shanghai)
Terminal and SWE Agents3 篇

《Probe-and-Refine Tuning of Repository Guidance for Coding Agents》〔应用 / 方法〕:LLM-based coding agents need higher-level operational knowledge about a repository (which files house which subsystems, how to run the test sui…

  1. Probe-and-Refine Tuning of Repository Guidance for Coding Agents · Score 87
    title matched "coding agent";summary matched "SWE-bench";has PDF
  2. Phoenix: Safe GitHub Issue Resolution via Multi-Agent LLMs · Score 83
    title matched "issue resolution";summary matched "SWE-bench";has PDF
  3. N-Version Programming with Coding Agents · Score 63
    title matched "coding agent";has PDF;has rich summary

2026-06-18

命中 1 篇生成于 2026-06-18 14:03:08 (Asia/Shanghai)
Terminal and SWE Agents1 篇

《Data Intelligence Agents: Interpreting, Modeling, and Querying Enterprise Data via Autonomous Coding Agents》〔评测 / 应用 / 方法〕:Production data integration is bottlenecked by repeated, lossy handoffs between data owners, en…

  1. Data Intelligence Agents: Interpreting, Modeling, and Querying Enterprise Data via Autonomous Coding Agents · Score 69
    title matched "coding agent";has PDF;has rich summary

2026-06-17

命中 5 篇生成于 2026-06-17 14:22:19 (Asia/Shanghai)
Terminal and SWE Agents5 篇

《All Smoke, No Alarm: Oracle Signals in Agent-Authored Test Code》〔方法〕:Software practitioners increasingly use AI coding agents that generate test code alongside production code in open source pull requests (PRs). Recent…

  1. All Smoke, No Alarm: Oracle Signals in Agent-Authored Test Code · Score 46
    summary matched "coding agent";has PDF;has rich summary
  2. LoopCoder-v2: Only Loop Once for Efficient Test-Time Computation Scaling · Score 44
    summary matched "SWE-bench";has PDF;has rich summary
  3. VoidPadding: Let [VOID] Handle Padding in Masked Diffusion Language Models so that [EOS] Can Focus on Semantic Termination · Score 44
    summary matched "code generation benchmark";has PDF;has rich summary
  4. GameCraft-Bench: Can Agents Build Playable Games End-to-End in a Real Game Engine? · Score 42
    summary matched "coding agent";has PDF;has rich summary
  5. Position: Coding Benchmarks Are Misaligned with Agentic Software Engineering · Score 40
    summary matched "coding agent";has PDF;has rich summary

2026-06-16

命中 3 篇生成于 2026-06-16 14:38:43 (Asia/Shanghai)
Terminal and SWE Agents3 篇

《Agent trajectories as programs: fingerprinting and programming coding-agent behavior》〔评测 / 数据 / 应用 / 方法〕:Benchmark scores tell you what an agent got right; they do not tell you how it got there. In this work, we introd…

  1. Agent trajectories as programs: fingerprinting and programming coding-agent behavior · Score 64
    summary matched "SWE-bench";summary matched "coding agent";has PDF
  2. Towards LLM Accelerated Rapid Reviews for Software Tool Discovery -- Case for Log Anomaly Detection · Score 44
    summary matched "coding agent";has PDF;has rich summary
  3. No Resource, No Benchmarks, No Problem? Evaluating and Improving LLMs for Code Generation in No-Resource Languages · Score 44
    summary matched "code generation benchmark";has PDF;has rich summary

2026-06-12

命中 2 篇生成于 2026-06-12 13:55:02 (Asia/Shanghai)
Terminal and SWE Agents2 篇

《Understanding the Rejection of Fixes Generated by Agentic Pull Requests -- Insights from the AIDev Dataset》〔数据 / 方法〕:AI coding agents are increasingly used to generate pull requests (PRs) that propose code fixes in sof…

  1. Understanding the Rejection of Fixes Generated by Agentic Pull Requests -- Insights from the AIDev Dataset · Score 57
    summary matched "coding agent";has DOI;has PDF
  2. Recursive Agent Harnesses · Score 47
    summary matched "coding agent";has PDF;has rich summary

2026-06-11

命中 3 篇生成于 2026-06-11 13:59:12 (Asia/Shanghai)
Terminal and SWE Agents3 篇

《PROJECTMEM: A Local-First, Event-Sourced Memory and Judgment Layer for AI Coding Agents》〔应用 / 方法〕:AI coding assistants now support a growing share of software work, from quick scripts to production applications. Yet th…

  1. PROJECTMEM: A Local-First, Event-Sourced Memory and Judgment Layer for AI Coding Agents · Score 69
    title matched "coding agent";has PDF;has rich summary
  2. Exploration Structure in LLM Agents for Multi-File Change Localization · Score 59
    summary matched "SWE-bench";summary matched "SWE bench";has PDF
  3. Agents All the Way Down; A Methodology for Building Custom AI Agents from Substrate to Production · Score 39
    summary matched "code agent";has PDF;has rich summary

2026-06-10

命中 3 篇生成于 2026-06-10 13:25:04 (Asia/Shanghai)
Terminal and SWE Agents3 篇

《Frontier Coding Agents Use Metaprogramming to Adapt to Unfamiliar Programming Languages》〔评测 / 方法〕:LLM-based coding agents are usually evaluated in familiar software settings: mainstream languages, common libraries, and…

  1. Frontier Coding Agents Use Metaprogramming to Adapt to Unfamiliar Programming Languages · Score 103
    title matched "coding agent";summary matched "Terminal-Bench";summary matched "SWE-bench"
  2. AutoPDE: Reliable Agentic PDE Solving via Explicitly Represented Solver Strategies · Score 60
    summary matched "coding agent";summary matched "code agent";has PDF
  3. DeNovoSWE: Scaling Long-Horizon Environments for Generating Entire Repositories from Scratch · Score 60
    summary matched "code agent";summary matched "bug fixing";has PDF

2026-06-09

命中 3 篇生成于 2026-06-09 13:12:49 (Asia/Shanghai)
Terminal and SWE Agents3 篇

《SIGA: Self-Evolving Coding-Agent Adapters for Scientific Simulation》〔方法〕:Advanced scientific simulators expose specialized input languages that turn simulation goals into executable configurations, but learning them ca…

  1. SIGA: Self-Evolving Coding-Agent Adapters for Scientific Simulation · Score 48
    summary matched "coding agent";has PDF;has rich summary
  2. From 0-to-1 to 1-to-N: Reproducible Engineering Evidence for MetaAI Recursive Self-Design · Score 46
    summary matched "SWE-bench";has PDF;has rich summary
  3. Self-Harness: Harnesses That Improve Themselves · Score 44
    summary matched "Terminal-Bench";has PDF;has rich summary

2026-06-05

命中 10 篇生成于 2026-06-05 13:25:00 (Asia/Shanghai)
Terminal and SWE Agents10 篇

《ADK Arena: Evaluating Agent Development Kits via LLM-as-a-Developer》〔评测 / 方法〕:The rapid proliferation of Agent Development Kits (ADKs), SDK-level frameworks for building LLM-powered autonomous agents, has outpaced any…

  1. ADK Arena: Evaluating Agent Development Kits via LLM-as-a-Developer · Score 94
    summary matched "Terminal-Bench";summary matched "SWE-bench";summary matched "coding agent"
  2. Asuka-Bench: Benchmarking Code Agents on Underspecified User Intent and Multi-Round Refinement · Score 80
    title matched "code agent";has PDF;has rich summary
  3. Knowledge Matters: Injecting Project and Testing Knowledge into LLM-based Unit Test Generation · Score 80
    title matched "test generation";has PDF;has rich summary
  4. SmellBench: Towards Fine-Grained Evaluation of Code Agents on Refactoring Tasks · Score 80
    title matched "code agent";has PDF;has rich summary
  5. From Failed Trajectories to Reliable LLM Agents: Diagnosing and Repairing Harness Flaws · Score 76
    summary matched "Terminal-Bench";summary matched "SWE-bench";has PDF

2026-06-04

命中 6 篇生成于 2026-06-04 14:02:06 (Asia/Shanghai)
Terminal and SWE Agents6 篇

《Latent Anchor-Driven Test Generation for Deep Neural Networks》〔数据 / 应用 / 方法〕:Deep Neural Networks (DNNs) are increasingly being deployed in security-critical and safety-sensitive applications, which makes rigorous test…

  1. Latent Anchor-Driven Test Generation for Deep Neural Networks · Score 79
    title matched "test generation";has PDF;has rich summary
  2. Can Generalist Agents Automate Data Curation? · Score 57
    summary matched "coding agent";has PDF;has rich summary
  3. Not All Errors Are Equal: Consequence-Aware Reasoning Compute Allocation · Score 57
    summary matched "SWE-bench";has PDF;has rich summary
  4. The Meta-Agent Challenge: Are Current Agents Capable of Autonomous Agent Development? · Score 57
    summary matched "code agent";has PDF;has rich summary
  5. The Saturation Trap and the Subjectivity of Intervention Timing: Why Affect-Based Triggers and LLM Judges Fail to Time Interventions on Autonomous Agents · Score 57
    summary matched "SWE-bench";has PDF;has rich summary

2026-06-03

命中 9 篇生成于 2026-06-03 14:09:56 (Asia/Shanghai)
Terminal and SWE Agents9 篇

《What Makes Interaction Trajectories Effective for Training Terminal Agents?》〔方法〕:Stronger code agents are commonly assumed to be superior teachers for post-training, yet this assumption remains poorly disentangled from…

  1. What Makes Interaction Trajectories Effective for Training Terminal Agents? · Score 115
    title matched "terminal agent";summary matched "Terminal-Bench";summary matched "code agent"
  2. Cross-Lingual Token Arbitrage: Optimizing Code Agent Context Windows via Local LLM Preprocessing · Score 97
    title matched "code agent";summary matched "coding agent";has PDF
  3. Dependency-Guided Repository-Level C-to-Rust Translation with Reinforcement Alignment · Score 97
    title matched "repository-level";summary matched "repository level";has PDF
  4. Handoff Debt: The Rediscovery Cost When Coding Agents Take Over Interrupted Tasks · Score 79
    title matched "coding agent";has PDF;has rich summary
  5. VulnAgent-R2: Evidence-Calibrated Multi-Agent Auditing for Repository-Level Vulnerability Detection · Score 79
    title matched "repository-level";has PDF;has rich summary

2026-06-02

命中 1 篇生成于 2026-06-02 13:56:35 (Asia/Shanghai)
Terminal and SWE Agents1 篇

《SkillHarm: Lifecycle-Aware Skill-Based Attacks via Automated Construction》〔评测 / 应用 / 方法〕:Agent skills occupy a privileged position in the agent workflow, as agents are expected to implicitly follow and execute them, re…

  1. SkillHarm: Lifecycle-Aware Skill-Based Attacks via Automated Construction · Score 47
    summary matched "coding agent";has PDF;has rich summary

2026-05-29

命中 2 篇生成于 2026-05-29 13:18:32 (Asia/Shanghai)
Terminal and SWE Agents2 篇

《Physics Is All You Need? A Case Study in Physicist-Supervised AI Development of Scientific Software》〔应用 / 方法〕:Are AI agents tools, co-authors, or researchers? We present a quantified case study ($N=1$): a physicist sup…

  1. Physics Is All You Need? A Case Study in Physicist-Supervised AI Development of Scientific Software · Score 48
    summary matched "coding agent";has PDF;has rich summary
  2. Discovering Cooperative Pipelines: Autoresearch for Sequential Social Dilemmas · Score 45
    summary matched "coding agent";has PDF;has rich summary

2026-05-28

命中 1 篇生成于 2026-05-28 13:15:52 (Asia/Shanghai)
Terminal and SWE Agents1 篇

《Calibrating Conservatism for Scalable Oversight》〔方法〕:Agentic AI systems capable of autonomous planning and extended environmental interaction pose a fundamental control problem: how can humans maintain meaningful…

  1. Calibrating Conservatism for Scalable Oversight · Score 48
    summary matched "SWE-bench";has PDF;has rich summary

2026-05-22

命中 3 篇生成于 2026-05-22 13:08:19 (Asia/Shanghai)
Terminal and SWE Agents3 篇

《"Refactoring Runaway": Understanding and Mitigating Tangled Refactorings in Coding Agents for Issue Resolution》〔方法〕:Recent advances in coding agents have shown remarkable progress in software issue resolution. In pract…

  1. "Refactoring Runaway": Understanding and Mitigating Tangled Refactorings in Coding Agents for Issue Resolution · Score 125
    title matched "coding agent";title matched "issue resolution";summary matched "SWE-bench"
  2. TerminalWorld: Benchmarking Agents on Real-World Terminal Tasks · Score 45
    summary matched "Terminal-Bench";has PDF;has rich summary
  3. Why Are Agentic Pull Requests Merged or Rejected? An Empirical Study · Score 45
    summary matched "coding agent";has PDF;has rich summary

2026-05-21

命中 1 篇生成于 2026-05-21 13:14:24 (Asia/Shanghai)
Terminal and SWE Agents1 篇

《SpecBench: Measuring Reward Hacking in Long-Horizon Coding Agents》〔评测 / 方法〕:As long-horizon coding agents produce more code than any developer can review, oversight collapses onto a single surface: the automated test s…

  1. SpecBench: Measuring Reward Hacking in Long-Horizon Coding Agents · Score 69
    title matched "coding agent";has PDF;has rich summary

2026-05-20

命中 5 篇生成于 2026-05-20 13:10:58 (Asia/Shanghai)
Terminal and SWE Agents5 篇

《Does Code Cleanliness Affect Coding Agents? A Controlled Minimal-Pair Study》〔评测 / 应用 / 方法〕:As autonomous coding agents see rapid adoption, their evaluation has primarily focused on task completion rates holding the tar…

  1. Does Code Cleanliness Affect Coding Agents? A Controlled Minimal-Pair Study · Score 80
    title matched "coding agent";has PDF;has rich summary
  2. PEEK: Context Map as an Orientation Cache for Long-Context LLM Agents · Score 58
    summary matched "coding agent";has PDF;has rich summary
  3. RoadmapBench: Evaluating Long-Horizon Agentic Software Development Across Version Upgrades · Score 58
    summary matched "coding agent";has PDF;has rich summary
  4. The Growing Pains of Frontier Models: When Leaderboards Stop Separating and What to Measure Next · Score 58
    summary matched "SWE-bench";has PDF;has rich summary
  5. Toward Training Superintelligent Software Agents through Self-Play SWE-RL · Score 58
    summary matched "SWE-bench";has PDF;has rich summary

2026-05-19

命中 3 篇生成于 2026-05-19 13:08:04 (Asia/Shanghai)
Terminal and SWE Agents3 篇

《Same Signal, Different Semantics: A Cross-Framework Behavioral Analysis of Software Engineering Agents》〔应用 / 方法〕:Behavioral studies of LLM-based software engineering agents extract operational rules about which traject…

  1. Same Signal, Different Semantics: A Cross-Framework Behavioral Analysis of Software Engineering Agents · Score 83
    title matched "software engineering agent";summary matched "SWE-bench";has PDF
  2. SkillsVote: Lifecycle Governance of Agent Skills from Collection, Recommendation to Evolution · Score 62
    summary matched "Terminal-Bench";summary matched "SWE-bench";has PDF
  3. Reversa: A Reverse Documentation Engineering Framework for Converting Legacy Software into Operational Specifications for AI Agents · Score 48
    summary matched "coding agent";has PDF;has rich summary

2026-05-15

命中 6 篇生成于 2026-05-15 14:57:29 (Asia/Shanghai)
Terminal and SWE Agents6 篇

《CRANE: Constrained Reasoning Injection for Code Agents via Nullspace Editing》〔评测 / 方法〕:Code agents must both reason over long-horizon repository state and obey strict tool-use protocols. In paired Instruct/Thinking che…

  1. CRANE: Constrained Reasoning Injection for Code Agents via Nullspace Editing · Score 115
    title matched "code agent";summary matched "Terminal-Bench";summary matched "SWE-bench"
  2. Remember Your Trace: Memory-Guided Long-Horizon Agentic Framework for Consistent and Hierarchical Repository-Level Code Documentation · Score 97
    title matched "repository-level";summary matched "coding agent";has PDF
  3. SWE-Chain: Benchmarking Coding Agents on Chained Release-Level Package Upgrades · Score 97
    title matched "coding agent";summary matched "issue resolution";has PDF
  4. Documentation-Guided Agentic Codebase Migration from C to Rust · Score 75
    summary matched "coding agent";summary matched "repository-level";has PDF
  5. Comparing Developer and LLM Biases in Code Evaluation · Score 57
    summary matched "code editing";has PDF;has rich summary