Feed Subscription

LLM 固定订阅页

适合长期跟踪单个研究方向。页面会汇总这个 feed 的最近 7 天 / 30 天表现,并保留每天命中的原始条目和 digest 链接。

最近 7 天

0

篇论文

0 个活跃 digest

最近 30 天

0

篇论文

0 个活跃 digest

全部历史

135

篇论文

9 个活跃 digest

近期走势

LLM 今日没有新的命中文献。

2026-06-15
0
2026-06-16
0
2026-06-17
0
2026-06-18
0
2026-06-19
0
2026-06-20
0
2026-06-21
0
2026-06-22
0
2026-06-23
0
2026-06-24
0
2026-06-25
0
2026-06-26
0
2026-06-27
0
2026-06-28
0

相关关键词页

如果这个 feed 同时命中了你配置里的关键词,这里会给出长期追踪入口。

历史命中

按天回看这个 feed 的命中文献,并保留当日 digest 的 Markdown / JSON 原始产物。

2026-04-24

命中 15 篇生成于 2026-04-24 11:46:20 (Asia/Shanghai)
LLM15 篇

《Tool Attention Is All You Need: Dynamic Tool Gating and Lazy Schema Loading for Eliminating the MCP/Tools Tax in Scalable Agentic Workflows》〔评测 / 应用 / 方法〕:The Model Context Protocol (MCP) has become a common interface…

  1. Tool Attention Is All You Need: Dynamic Tool Gating and Lazy Schema Loading for Eliminating the MCP/Tools Tax in Scalable Agentic Workflows · Score 106
    title matched "agent";summary matched "reasoning";summary matched "benchmark"
  2. Learning to Communicate: Toward End-to-End Optimization of Multi-Agent Language Systems · Score 106
    title matched "agent";summary matched "reasoning";summary matched "benchmark"
  3. AgenticQwen: Training Small Agentic Language Models with Dual Data Flywheels for Industrial-Scale Tool Use · Score 102
    title matched "agent";summary matched "reasoning";summary matched "benchmark"
  4. Temporal Taskification in Streaming Continual Learning: A Source of Evaluation Instability · Score 90
    title matched "evaluation";summary matched "benchmark";has PDF
  5. Nemobot Games: Crafting Strategic AI Gaming Agents for Interactive Learning with Large Language Models · Score 90
    title matched "agent";summary matched "reasoning";has PDF

2026-04-23

命中 15 篇生成于 2026-04-23 11:42:13 (Asia/Shanghai)
LLM15 篇

《OMIBench: Benchmarking Olympiad-Level Multi-Image Reasoning in Large Vision-Language Model》〔评测 / 方法〕:Large vision-language models (LVLMs) have made substantial advances in reasoning tasks at the Olympiad level. Neverth…

  1. OMIBench: Benchmarking Olympiad-Level Multi-Image Reasoning in Large Vision-Language Model · Score 129
    title matched "reasoning";title matched "benchmark";summary matched "evaluation"
  2. V-tableR1: Process-Supervised Multimodal Table Reasoning with Critic-Guided Policy Optimization · Score 125
    title matched "reasoning";summary matched "alignment";summary matched "benchmark"
  3. ONOTE: Benchmarking Omnimodal Notation Processing for Expert-level Music Intelligence · Score 124
    title matched "benchmark";summary matched "reasoning";summary matched "alignment"
  4. Where and What: Reasoning Dynamic and Implicit Preferences in Situated Conversational Recommendation · Score 106
    title matched "reasoning";summary matched "alignment";summary matched "benchmark"
  5. Cooperative Profiles Predict Multi-Agent LLM Team Performance in AI for Science Workflows · Score 105
    title matched "agent";summary matched "reasoning";summary matched "benchmark"

2026-04-22

命中 15 篇生成于 2026-04-22 11:37:03 (Asia/Shanghai)
LLM15 篇

《Four-Axis Decision Alignment for Long-Horizon Enterprise AI Agents》〔评测 / 应用 / 方法〕:Long-horizon enterprise agents make high-stakes decisions (loan underwriting, claims adjudication, clinical review, prior authorization)…

  1. Four-Axis Decision Alignment for Long-Horizon Enterprise AI Agents · Score 162
    title matched "agent";title matched "alignment";summary matched "reasoning"
  2. Cyber Defense Benchmark: Agentic Threat Hunting Evaluation for LLMs in SecOps · Score 149
    title matched "agent";title matched "benchmark";title matched "evaluation"
  3. Taming Actor-Observer Asymmetry in Agents via Dialectical Alignment · Score 145
    title matched "agent";title matched "alignment";summary matched "reasoning"
  4. Discovering a Shared Logical Subspace: Steering LLM Logical Reasoning via Alignment of Natural-Language and Symbolic Views · Score 130
    title matched "reasoning";title matched "alignment";summary matched "benchmark"
  5. Revac: A Social Deduction Reasoning Agent · Score 127
    title matched "agent";title matched "reasoning";summary matched "evaluation"

2026-04-21

命中 15 篇生成于 2026-04-21 11:40:46 (Asia/Shanghai)
LLM15 篇

《MathNet: a Global Multimodal Benchmark for Mathematical Reasoning and Retrieval》〔评测 / 数据 / 方法〕:Mathematical problem solving remains a challenging test of reasoning for large language and multimodal models, yet existing…

  1. MathNet: a Global Multimodal Benchmark for Mathematical Reasoning and Retrieval · Score 112
    title matched "reasoning";title matched "benchmark";has PDF
  2. Benchmarking System Dynamics AI Assistants: Cloud Versus Local LLMs on CLD Extraction and Discussion · Score 108
    title matched "benchmark";summary matched "reasoning";summary matched "evaluation"
  3. ClawEnvKit: Automatic Environment Generation for Claw-Like Agents · Score 107
    title matched "agent";summary matched "benchmark";summary matched "evaluation"
  4. MASS-RAG: Multi-Agent Synthesis Retrieval-Augmented Generation · Score 107
    title matched "agent";summary matched "reasoning";summary matched "benchmark"
  5. OneVL: One-Step Latent Reasoning and Planning with Vision-Language Explanation · Score 106
    title matched "reasoning";summary matched "agent";summary matched "benchmark"

2026-04-17

命中 15 篇生成于 2026-04-17 11:39:21 (Asia/Shanghai)
LLM15 篇

《CoopEval: Benchmarking Cooperation-Sustaining Mechanisms and LLM Agents in Social Dilemmas》〔评测 / 方法〕:It is increasingly important that LLM agents interact effectively and safely with other goal-pursuing agents, yet, re…

  1. CoopEval: Benchmarking Cooperation-Sustaining Mechanisms and LLM Agents in Social Dilemmas · Score 130
    title matched "agent";title matched "benchmark";summary matched "reasoning"
  2. IE as Cache: Information Extraction Enhanced Agentic Reasoning · Score 124
    title matched "agent";title matched "reasoning";summary matched "benchmark"
  3. QuantCode-Bench: A Benchmark for Evaluating the Ability of Large Language Models to Generate Executable Algorithmic Trading Strategies · Score 123
    title matched "benchmark";summary matched "agent";summary matched "alignment"
  4. From Reactive to Proactive: Assessing the Proactivity of Voice Agents via ProVoice-Bench · Score 122
    title matched "agent";summary matched "reasoning";summary matched "benchmark"
  5. An Axiomatic Benchmark for Evaluation of Scientific Novelty Metrics · Score 109
    title matched "benchmark";title matched "evaluation";has PDF

2026-04-16

命中 15 篇生成于 2026-04-16 11:43:00 (Asia/Shanghai)
LLM15 篇

《GeoAgentBench: A Dynamic Execution Benchmark for Tool-Augmented Agents in Spatial Analysis》〔评测 / 应用 / 方法〕:The integration of Large Language Models (LLMs) into Geographic Information Systems (GIS) marks a paradigm shift…

  1. GeoAgentBench: A Dynamic Execution Benchmark for Tool-Augmented Agents in Spatial Analysis · Score 162
    title matched "agent";title matched "benchmark";summary matched "reasoning"
  2. HINTBench: Horizon-agent Intrinsic Non-attack Trajectory Benchmark · Score 127
    title matched "agent";title matched "benchmark";summary matched "evaluation"
  3. Character Beyond Speech: Leveraging Role-Playing Evaluation in Audio Large Language Models via Reinforcement Learning · Score 120
    title matched "evaluation";summary matched "agent";summary matched "reasoning"
  4. LongCoT: Benchmarking Long-Horizon Chain-of-Thought Reasoning · Score 112
    title matched "reasoning";title matched "benchmark";has PDF
  5. Correct Prediction, Wrong Steps? Consensus Reasoning Knowledge Graph for Robust Chain-of-Thought Synthesis · Score 108
    title matched "reasoning";summary matched "benchmark";summary matched "evaluation"

2026-04-15

命中 15 篇生成于 2026-04-15 11:35:50 (Asia/Shanghai)
LLM15 篇

《Parallax: Why AI Agents That Think Must Never Act》〔评测 / 应用 / 方法〕:Autonomous AI agents are rapidly transitioning from experimental tools to operational infrastructure, with projections that 80% of enterprise application…

  1. Parallax: Why AI Agents That Think Must Never Act · Score 107
    title matched "agent";summary matched "reasoning";summary matched "evaluation"
  2. Drawing on Memory: Dual-Trace Encoding Improves Cross-Session Recall in LLM Agents · Score 107
    title matched "agent";summary matched "reasoning";summary matched "benchmark"
  3. Round-Trip Translation Reveals What Frontier Multilingual Benchmarks Miss · Score 106
    title matched "benchmark";summary matched "reasoning";summary matched "evaluation"
  4. Towards Long-horizon Agentic Multimodal Search · Score 106
    title matched "agent";summary matched "reasoning";summary matched "benchmark"
  5. QuarkMedSearch: A Long-Horizon Deep Search Agent for Exploring Medical Intelligence · Score 105
    title matched "agent";summary matched "benchmark";summary matched "evaluation"

2026-04-14

命中 15 篇生成于 2026-04-14 11:37:06 (Asia/Shanghai)
LLM15 篇

《UniToolCall: Unifying Tool-Use Representation, Data, and Evaluation for LLM Agents》〔评测 / 数据 / 应用 / 方法〕:Tool-use capability is a fundamental component of LLM agents, enabling them to interact with external systems throu…

  1. UniToolCall: Unifying Tool-Use Representation, Data, and Evaluation for LLM Agents · Score 145
    title matched "agent";title matched "evaluation";summary matched "reasoning"
  2. General365: Benchmarking General Reasoning in Large Language Models Across Diverse and Challenging Tasks · Score 130
    title matched "reasoning";title matched "benchmark";summary matched "evaluation"
  3. Collaborative Multi-Agent Scripts Generation for Enhancing Imperfect-Information Reasoning in Murder Mystery Games · Score 129
    title matched "agent";title matched "reasoning";summary matched "benchmark"
  4. FM-Agent: Scaling Formal Methods to Large Systems via LLM-Based Hoare-Style Reasoning · Score 127
    title matched "agent";title matched "reasoning";summary matched "evaluation"
  5. From Translation to Superset: Benchmark-Driven Evolution of a Production AI Agent from Rust to Python · Score 126
    title matched "agent";title matched "benchmark";summary matched "evaluation"

2026-04-08

命中 15 篇生成于 2026-04-08 17:10:24 (Asia/Shanghai)