agent runtime Topic Archive

agent runtime Topic Archive agent-runtime.html 关键词 agent runtime 的长期追踪 RSS，汇总历史命中文献。 zh-CN Sun, 28 Jun 2026 05:24:06 +0000 Getting Better at Working With You: Compiling User Corrections into Runtime Enforcement for Coding Agents ../papers/arxiv-2423b3a60f1a.html https://arxiv.org/abs/2606.13174v1#2026-06-12#agent-runtime Fri, 12 Jun 2026 13:55:02 +0800 Interactive LLM agents are becoming part of daily work, but they do not reliably become easier to work with over time: a correction remembered in one session may still be violated in the next. We study this gap between preference access and preference compliance. In tasks derived from anonymized real-user friction cases, Mem0 memory still leaves 57.5% of applicable preference checks violated. We introduce Test-time Rule Acquisition and Compiled Enforcement (TRACE), a drop-in skill-layer pipelin… WeaveBench: A Long-Horizon, Real-World Benchmark for Computer-Use Agents with Hybrid Interfaces ../papers/arxiv-df62d5981d92.html https://arxiv.org/abs/2606.09426v1#2026-06-09#agent-runtime Tue, 09 Jun 2026 13:12:49 +0800 Computer-use agents (CUAs) increasingly operate in runtimes that combine visual desktop control, command-line execution, code editing, browsers, and external tools. Existing benchmarks, however, often evaluate these interfaces as separable capabilities, leaving long-horizon cross-interface orchestration under-tested. Thus, we introduce WeaveBench, a long-horizon hybrid-interface benchmark with 114 tasks across 8 real-world work domains, grounded in real user requests and publicly verifiable art… AgentJet: A Flexible Swarm Training Framework for Agentic Reinforcement Learning ../papers/arxiv-911b2974df53.html https://arxiv.org/abs/2606.04484#2026-06-04#agent-runtime Thu, 04 Jun 2026 14:02:06 +0800 We present AgentJet, a distributed swarm training framework for large language model (LLM) agent reinforcement learning. Unlike centralized frameworks that tightly couple agent rollouts with model optimization, AgentJet adopts a decoupled multi-node architecture in which swarm server nodes host trainable models and run optimization on GPU clusters, whereas swarm client nodes execute arbitrary agents on arbitrary devices. This design provides capabilities that are difficult to support in central… Governed Evolution of Agent Runtimes through Executable Operational Cognition ../papers/arxiv-1f7ac8243ebb.html https://arxiv.org/abs/2605.27328v1#2026-05-27#agent-runtime Wed, 27 May 2026 13:23:19 +0800 Recent advances in agentic systems increasingly treat code as an executable operational substrate rather than as a disposable output artifact. Prior work such as \emph{Code as Agent Harness} frames validated agent-generated artifacts as runtime entities that can be created, executed, revised, persisted, and reused within long-running cognitive loops. However, the governance, lifecycle management, and operational evolution of such artifacts remain under-specified. This paper proposes a framework… Persistent AI Agents in Academic Research: A Single-Investigator Implementation Case Study ../papers/arxiv-141dfe69e641.html https://arxiv.org/abs/2605.26870v1#2026-05-27#agent-runtime Wed, 27 May 2026 13:23:19 +0800 Background: Large language models are typically evaluated as models, benchmarks, or short conversational episodes. Less is known about what happens when an agent is embedded persistently in a real academic research environment with durable memory, local files, external tools, scheduled routines, delegated roles, and explicit safety protocols. Methods: A structured self-observed implementation case study was conducted from January 31 to May 25, 2026. The unit of analysis was the persistent human… HarnessAPI: A Skill-First Framework for Unified Streaming APIs and MCP Tools ../papers/arxiv-0f153989bf3c.html https://arxiv.org/abs/2605.22733v1#2026-05-22#agent-runtime Fri, 22 May 2026 13:08:19 +0800 Every Python function deployed as an LLM tool must today exist in two forms: an HTTP endpoint for human-facing clients and CI pipelines, and an MCP tool registration for agent runtimes such as Claude and Cursor. These representations share business logic yet diverge in all the surrounding machinery (routing, validation, serialisation, streaming, and schema maintenance), and they drift apart as the underlying code evolves. We present HarnessAPI, a Python framework that eliminates this duplicatio… A Methodology for Selecting and Composing Runtime Architecture Patterns for Production LLM Agents ../papers/arxiv-b91697bc94e9.html https://arxiv.org/abs/2605.20173#2026-05-20#agent-runtime Wed, 20 May 2026 13:10:58 +0800 Production LLM agents combine stochastic model outputs with deterministic software systems, yet the boundary between the two is rarely treated as a first-class architectural object. This paper names that boundary the stochastic-deterministic boundary (SDB): a four-part contract among a proposer, verifier, commit step, and reject signal that specifies how an LLM output becomes a system action. We argue that the SDB is the load-bearing primitive of production agent runtimes. Around this primitive… ADEMA: A Knowledge-State Orchestration Architecture for Long-Horizon Knowledge Synthesis with LLMAgents ../papers/arxiv-0733482685d3.html https://arxiv.org/abs/2604.25849v1#2026-04-29#agent-runtime Wed, 29 Apr 2026 12:26:28 +0800 Long-horizon LLM tasks often fail not because a single answer is unattainable, but because knowledge states drift across rounds, intermediate commitments remain implicit, and interruption fractures the evolving evidence chain. This paper presents ADEMA as a knowledge-state orchestration architecture for long-horizon knowledge synthesis rather than as a generic multi-agent runtime. The architecture combines explicit epistemic bookkeeping, heterogeneous dual-evaluator governance, adaptive task-mo…