Keyword Tracking

关键词追踪：agent

这个页面会长期追踪你配置里关心的关键词，并把命中的论文按日期沉淀下来。

返回归档首页查看趋势总览最新 JSON 订阅 RSS

近期走势

最近一次命中来自 LM：Joint Learning of Experiential Rules and Policies for Large Language Model Agents

2026-06-15

2026-06-16

2026-06-17

2026-06-18

2026-06-19

2026-06-20

2026-06-21

2026-06-22

2026-06-23

2026-06-24

2026-06-25

2026-06-26

2026-06-27

2026-06-28

命中明细

按日期回看匹配到这个关键词的论文标题，并保留来源 feed 信息。

2026-06-26

2026-06-26 13:16:53 (Asia/Shanghai)

Joint Learning of Experiential Rules and Policies for Large Language Model Agents

查看原始来源

For LLM agents in multi-step interactive environments, a key challenge is to make effective use of accumulated interaction experience. Existing work has typically separated two us…

Semantic Early-Stopping for Iterative LLM Agent Loops

查看原始来源

Multi-agent large language model (LLM) loops, for example a Writer that drafts and a Critic that revises, are almost always terminated by a fixed iteration cap (max_iterations). T…

When Does Combining Language Models Help? A Co-Failure Ceiling on Routing, Voting, and Mixture-of-Agents Across 67 Frontier Models

查看原始来源

Multi-model LLM systems such as routing, voting, cascades, fusion, and mixture-of-agents are used to beat single-model accuracy. We show that their gain is capped by a quantity th…

查看原始来源

AI coding agents dramatically accelerate implementation speed but introduce two structural failure modes that existing spec-driven approaches do not fully solve: (1) context explo…

查看原始来源

Resolving issues with ambiguous and incomplete descriptions, particularly concerning complex bugs, requires a sophisticated, long-horizon workflow. Agents must navigate codebases…

查看原始来源

As large language model (LLM) agents are applied to longer tasks, they increasingly modify workspace state across multiple rounds of iteration. However, agents typically observe o…

2026-06-23

2026-06-23 13:10:02 (Asia/Shanghai)

MAS-PromptBench: When Does Prompt Optimization Improve Multi-Agent LLM Systems?

查看原始来源

Multi-agent systems (MAS) offer a scalable path forward for agentic AI, comprising multiple LLM-based agents, each assigned a system prompt and a position within a workflow that g…

查看原始来源

Terminal-using agents have quickly become the most popular downstream application of language models (LMs). Despite their prevalence, relatively little academic work has examined…

查看原始来源

This paper revisits the classical concept on N-version programming in the setting of contemporary AI coding agents. Revisiting the seminal Knight-Leveson experiment, we study whet…

2026-06-18

2026-06-18 14:03:08 (Asia/Shanghai)

Enhancing Decision-Making with Large Language Models through Multi-Agent Fictitious Play

查看原始来源

Large language model (LLM)-based multi-agent systems (MAS) have demonstrated great potential in solving tasks with execution complexity, by distributing subtasks across cooperativ…

A Technical Taxonomy of LLM Agent Communication Protocols

查看原始来源

As large language models (LLMs) advance and multi-agent systems aim to overcome the limits of standalone agents, robust communication protocols are becoming essential infrastructu…

查看原始来源

Production data integration is bottlenecked by repeated, lossy handoffs between data owners, engineers, and analysts who must collaboratively discover, structure, and query enterp…

查看原始来源

Coding agents have become a major mode of software engineering, but the benchmarks we use to compare them were designed in a pre-agent era: they collapse model, harness, and envir…

查看原始来源

In software engineering research, the primary outcome is frequently a tool. However, for practitioners and academics alike, it is hard to tell which tools are maintained and do th…

查看原始来源

Recursive language models (RLMs) showed that recursion over model calls is an effective strategy for long-context reasoning, and production coding agents have begun to write code…

查看原始来源

Custom AI agents areagents that live inside their own application, talk to their own data and tools, enforce their own security boundaries, and carry their own brand and audit tra…

查看原始来源

As the capabilities of LLM-based code agents continue to advance, their expected role is expanding beyond localized bug fixing in existing codebases toward architecting and implem…

查看原始来源

The performance of LLM-based agents is jointly shaped by their base models and the harnesses that mediate their interaction with the environment. Because different models exhibit…

查看原始来源

Coding agents increasingly act as codebase-scale collaborators that can assist with codebase conversion, but this progress has exposed a critical weakness: agents often over-trust…

查看原始来源

With the rapid rise of AI coding agents, the fundamental premise of what it means to be a software engineer is in question. In this vision paper, we examine what it means for an A…

2026-06-03

2026-06-03 14:09:56 (Asia/Shanghai)

Agentic Chain-of-Thought Steering for Efficient and Controllable LLM Reasoning

查看原始来源

Large language models improve final-answer accuracy through extended chain-of-thought reasoning, but often spend tokens inefficiently and offer little inference-time control. Exis…

查看原始来源

Autonomous LLM training is often framed as recipe search, which leaves the training harness largely static. This limitation sharpens in agentic RL, where shifting bottlenecks and…

Terminal and SWE Agents

Human-AI Collaboration and the Transformation of Software Engineering Work

查看原始来源

The integration of Generative AI (GenAI) and Agentic AI into software development is reconfiguring software engineering from an activity centered on human authorship of code into…

查看原始来源

Agent skills occupy a privileged position in the agent workflow, as agents are expected to implicitly follow and execute them, rendering third-party skills a vulnerable attack sur…

2026-05-29

2026-05-29 13:18:32 (Asia/Shanghai)

KVoiceBench, KOpenAudioBench, and KMMAU: Agent-Driven Korean Speech Benchmarks for Evaluating SpeechLMs

查看原始来源

Speech language models (SpeechLMs) have achieved substantial progress by extending large language models (LLMs) to the speech modality. However, SpeechLM evaluation remains heavil…

LegalGraphRAG: Multi-Agent Graph Retrieval-Augmented Generation for Reliable Legal Reasoning

查看原始来源

Graph-based Retrieval-Augmented Generation (GraphRAG) advances flat document retrieval by structuring knowledge as relational graphs, enabling more coherent and effective reasonin…

查看原始来源

We study two-level autoresearch for cooperation: an outer-loop AI agent autonomously redesigns the inner-loop pipeline of an LLM policy-synthesis system for multi-agent Sequential…

查看原始来源

Agentic AI systems capable of autonomous planning and extended environmental interaction pose a fundamental control problem: how can humans maintain meaningful oversight of system…

查看原始来源

Background: Large language models are typically evaluated as models, benchmarks, or short conversational episodes. Less is known about what happens when an agent is embedded persi…

查看原始来源

Autonomous computer use agents that powered by multimodal large language models (MLLMs) are emerging as capable assistants for completing complex digital workflows. However, real-…

查看原始来源

AI coding agents increasingly submit pull requests (Agentic-PRs) to open-source repositories, yet their performance is commonly assessed using merge and rejection outcomes alone.…

2026-05-21

2026-05-21 13:14:24 (Asia/Shanghai)

What Twelve LLM Agent Benchmark Papers Disclose About Themselves: A Pilot Audit and an Open Scoring Schema

查看原始来源

We read twelve well-known LLM agent benchmark papers and recorded, dimension by dimension, what each paper actually says about how its evaluation was run. The motivation came from…

Insights Generator: Systematic Corpus-Level Trace Diagnostics for LLM Agents

查看原始来源

Diagnosing failures in LLM agents remains largely manual. Practitioners inspect a small subset of execution traces, form ad-hoc hypotheses, and iterate. This process misses patter…

DeepWeb-Bench: A Deep Research Benchmark Demanding Massive Cross-Source Evidence and Long-Horizon Derivation

查看原始来源

Deep research, in which an agent searches the open web, collects evidence, and derives an answer through extended reasoning, is a prominent use case for frontier language models.…

查看原始来源

As long-horizon coding agents produce more code than any developer can review, oversight collapses onto a single surface: the automated test suite. Reward hacking naturally arises…

2026-05-20

2026-05-20 13:10:58 (Asia/Shanghai)

Robust Checkpoint Selection for Multimodal LLMs via Agentic Evaluation and Stability-Aware Ranking

查看原始来源

Checkpoint selection for multimodal large language models (MLLMs) presents significant challenges when performance differentials are marginal and evaluation signals are prone to n…

ClinSeekAgent: Automating Multimodal Evidence Seeking for Agentic Clinical Reasoning

查看原始来源

Large language models (LLMs) and agentic systems have shown promise for clinical decision support, but existing works largely assume that evidence has already been curated and han…

FineBench: Benchmarking and Enhancing Vision-Language Models for Fine-grained Human Activity Understanding

查看原始来源

Vision-Language Models (VLMs) have demonstrated remarkable capabilities in general video understanding, yet they often struggle with the fine-grained comprehension crucial for rea…

查看原始来源

While current software agents powered by large language models (LLMs) and agentic reinforcement learning (RL) can boost programmer productivity, their training data (e.g., GitHub…

查看原始来源

Legacy systems concentrate business rules, architectural decisions, and operational exceptions that often remain implicit in code, data, configuration, and maintenance practices.…

2026-05-15 14:57:29 (Asia/Shanghai)

Talk is (Not) Cheap: A Taxonomy and Benchmark Coverage Audit for LLM Attacks

查看原始来源

We introduce a reusable framework for auditing whether LLM attack benchmarks collectively cover the threat surface: a 4$\times$6 Target $\times$ Technique matrix grounded in STRID…

SpeakerLLM: A Speaker-Specialized Audio-LLM for Speaker Understanding and Verification Reasoning

查看原始来源

As audio-first agents become increasingly common in physical AI, conversational robots, and screenless wearables, audio large language models (audio-LLMs) must integrate speaker-s…

查看原始来源

Agent benchmarks have become the de facto measure of frontier AI competence, guiding model selection, investment, and deployment. However, reward hacking, where agents maximize a…

查看原始来源

The integration of Large Language Models (LLMs) into Electronic Design Automation (EDA) and hardware security is rapidly reshaping the semiconductor industry. While LLMs offer unp…

From Controlled to the Wild: Evaluation of Pentesting Agents for the Real-World

查看原始来源

AI pentesting agents are increasingly credible as offensive security systems, but current benchmarks still provide limited guidance on which will perform best in real-world target…

Agent Runtime Security

MATRA: Modeling the Attack Surface of Agentic AI Systems -- OpenClaw Case Study

查看原始来源

LLMs are increasingly deployed as autonomous agents with access to tools, databases, and external services, yet practitioners (across different sectors) lack systematic methods to…

查看原始来源

Coding agents often pass per-prompt safety review yet ship exploitable code when their tasks are decomposed into routine engineering tickets. The challenge is structural: existing…

2026-05-05

2026-05-05 12:20:54 (Asia/Shanghai)

MemORAI: Memory Organization and Retrieval via Adaptive Graph Intelligence for LLM Conversational Agents

查看原始来源

Large Language Models (LLMs) lack persistent memory for long-term personalized conversations. Existing graph-based memory systems suffer from information dilution, absent provenan…

Verbal-R3: Verbal Reranker as the Missing Bridge between Retrieval and Reasoning

查看原始来源

The conventional Retrieval-Augmented Generation (RAG) paradigm of injecting raw retrieved texts into the Large Language Model (LLM)'s context often results in suboptimal integrati…

查看原始来源

Large language model (LLM) agents deployed in clinical settings often exhibit abrupt, threshold-driven behavior, offering little visibility into accumulating risk prior to escalat…

查看原始来源

Scientific workflow systems automate execution -- scheduling, fault tolerance, resource management -- but not the semantic translation that precedes it. Scientists still manually…

查看原始来源

Training multimodal agents via reinforcement learning for knowledge-intensive visual reasoning is fundamentally hindered by the extreme sparsity of outcome-based supervision and t…

查看原始来源

At present, executable visual workflows have emerged as a mainstream paradigm in real-world industrial deployments, offering strong reliability and controllability. However, in cu…

查看原始来源

BACKGROUND: Clinical trial enrollment in oncology remains critically low, with fewer than 5% of eligible adults participating, in large part due to the complexity and labor intens…

查看原始来源

Vision-language models (VLM) have markedly advanced AI-driven interpretation and reporting of complex medical imaging, such as computed tomography (CT). Yet, existing methods larg…

查看原始来源

BACKGROUND AND OBJECTIVES: Traditional medical board examinations present clinical information in static vignettes with multiple-choices (MC), fundamentally different from how phy…

查看原始来源

BACKGROUND: Artificial intelligence-powered conversational agents (ie, chatbots) are increasingly popular outlets for users seeking psychological support, yet little is known abou…

查看原始来源

International audience

查看原始来源

Multi-agent systems are a core research area in distributed artificial intelligence, where algorithm design faces challenges due to complex nonlinear environments. This paper pres…

查看原始来源

Recent progress in artificial intelligence (AI) is unlocking transformative capabilities for mathematics. There is great hope that AI will help solve major open problems and auton…