最近 7 天
60
篇论文
Feed Subscription
适合长期跟踪单个研究方向。页面会汇总这个 feed 的最近 7 天 / 30 天表现,并保留每天命中的原始条目和 digest 链接。
最近 7 天
60
篇论文
最近 30 天
105
篇论文
全部历史
105
篇论文
《Four-Axis Decision Alignment for Long-Horizon Enterprise AI Agents》〔评测 / 应用 / 方法〕:Long-horizon enterprise agents make high-stakes decisions (loan underwriting, claims adjudication, clinical review, prior authorization)…
如果这个 feed 同时命中了你配置里的关键词,这里会给出长期追踪入口。
按天回看这个 feed 的命中文献,并保留当日 digest 的 Markdown / JSON 原始产物。
《Four-Axis Decision Alignment for Long-Horizon Enterprise AI Agents》〔评测 / 应用 / 方法〕:Long-horizon enterprise agents make high-stakes decisions (loan underwriting, claims adjudication, clinical review, prior authorization)…
《MathNet: a Global Multimodal Benchmark for Mathematical Reasoning and Retrieval》〔评测 / 数据 / 方法〕:Mathematical problem solving remains a challenging test of reasoning for large language and multimodal models, yet existing…
《CoopEval: Benchmarking Cooperation-Sustaining Mechanisms and LLM Agents in Social Dilemmas》〔评测 / 方法〕:It is increasingly important that LLM agents interact effectively and safely with other goal-pursuing agents, yet, re…
《GeoAgentBench: A Dynamic Execution Benchmark for Tool-Augmented Agents in Spatial Analysis》〔评测 / 应用 / 方法〕:The integration of Large Language Models (LLMs) into Geographic Information Systems (GIS) marks a paradigm shift…
《Parallax: Why AI Agents That Think Must Never Act》〔评测 / 应用 / 方法〕:Autonomous AI agents are rapidly transitioning from experimental tools to operational infrastructure, with projections that 80% of enterprise application…
《UniToolCall: Unifying Tool-Use Representation, Data, and Evaluation for LLM Agents》〔评测 / 数据 / 应用 / 方法〕:Tool-use capability is a fundamental component of LLM agents, enabling them to interact with external systems throu…
收录 15 篇,重点包括《Paper Circle: An Open-source Multi-agent Research Discovery and Analysis Framework》、《Topological Characterization of Churn Flow and Unsupervised Correction to the Wu Flow-Regime Map in Small-Diameter Vertic…