Feed Subscription

LLM 固定订阅页

适合长期跟踪单个研究方向。页面会汇总这个 feed 的最近 7 天 / 30 天表现，并保留每天命中的原始条目和 digest 链接。

返回归档首页查看趋势总览最新 Markdown 订阅 RSS

近期走势

LLM 今日没有新的命中文献。

2026-06-15

2026-06-16

2026-06-17

2026-06-18

2026-06-19

2026-06-20

2026-06-21

2026-06-22

2026-06-23

2026-06-24

2026-06-25

2026-06-26

2026-06-27

2026-06-28

历史命中

按天回看这个 feed 的命中文献，并保留当日 digest 的 Markdown / JSON 原始产物。

2026-04-24

命中 15 篇生成于 2026-04-24 11:46:20 (Asia/Shanghai)

Markdown JSON

LLM15 篇

《Tool Attention Is All You Need: Dynamic Tool Gating and Lazy Schema Loading for Eliminating the MCP/Tools Tax in Scalable Agentic Workflows》〔评测 / 应用 / 方法〕：The Model Context Protocol (MCP) has become a common interface…

Tool Attention Is All You Need: Dynamic Tool Gating and Lazy Schema Loading for Eliminating the MCP/Tools Tax in Scalable Agentic Workflows · Score 106
title matched "agent"；summary matched "reasoning"；summary matched "benchmark"
原始来源
Learning to Communicate: Toward End-to-End Optimization of Multi-Agent Language Systems · Score 106
title matched "agent"；summary matched "reasoning"；summary matched "benchmark"
原始来源
AgenticQwen: Training Small Agentic Language Models with Dual Data Flywheels for Industrial-Scale Tool Use · Score 102
title matched "agent"；summary matched "reasoning"；summary matched "benchmark"
原始来源
Temporal Taskification in Streaming Continual Learning: A Source of Evaluation Instability · Score 90
title matched "evaluation"；summary matched "benchmark"；has PDF
原始来源
Nemobot Games: Crafting Strategic AI Gaming Agents for Interactive Learning with Large Language Models · Score 90
title matched "agent"；summary matched "reasoning"；has PDF
原始来源

2026-04-23

命中 15 篇生成于 2026-04-23 11:42:13 (Asia/Shanghai)

Markdown JSON

LLM15 篇

《OMIBench: Benchmarking Olympiad-Level Multi-Image Reasoning in Large Vision-Language Model》〔评测 / 方法〕：Large vision-language models (LVLMs) have made substantial advances in reasoning tasks at the Olympiad level. Neverth…

OMIBench: Benchmarking Olympiad-Level Multi-Image Reasoning in Large Vision-Language Model · Score 129
title matched "reasoning"；title matched "benchmark"；summary matched "evaluation"
原始来源
V-tableR1: Process-Supervised Multimodal Table Reasoning with Critic-Guided Policy Optimization · Score 125
title matched "reasoning"；summary matched "alignment"；summary matched "benchmark"
原始来源
ONOTE: Benchmarking Omnimodal Notation Processing for Expert-level Music Intelligence · Score 124
title matched "benchmark"；summary matched "reasoning"；summary matched "alignment"
原始来源
Where and What: Reasoning Dynamic and Implicit Preferences in Situated Conversational Recommendation · Score 106
title matched "reasoning"；summary matched "alignment"；summary matched "benchmark"
原始来源
Cooperative Profiles Predict Multi-Agent LLM Team Performance in AI for Science Workflows · Score 105
title matched "agent"；summary matched "reasoning"；summary matched "benchmark"
原始来源

2026-04-22

命中 15 篇生成于 2026-04-22 11:37:03 (Asia/Shanghai)

Markdown JSON

LLM15 篇

《Four-Axis Decision Alignment for Long-Horizon Enterprise AI Agents》〔评测 / 应用 / 方法〕：Long-horizon enterprise agents make high-stakes decisions (loan underwriting, claims adjudication, clinical review, prior authorization)…

Four-Axis Decision Alignment for Long-Horizon Enterprise AI Agents · Score 162
title matched "agent"；title matched "alignment"；summary matched "reasoning"
原始来源
Cyber Defense Benchmark: Agentic Threat Hunting Evaluation for LLMs in SecOps · Score 149
title matched "agent"；title matched "benchmark"；title matched "evaluation"
原始来源
Taming Actor-Observer Asymmetry in Agents via Dialectical Alignment · Score 145
title matched "agent"；title matched "alignment"；summary matched "reasoning"
原始来源
Discovering a Shared Logical Subspace: Steering LLM Logical Reasoning via Alignment of Natural-Language and Symbolic Views · Score 130
title matched "reasoning"；title matched "alignment"；summary matched "benchmark"
原始来源
Revac: A Social Deduction Reasoning Agent · Score 127
title matched "agent"；title matched "reasoning"；summary matched "evaluation"
原始来源

2026-04-21

命中 15 篇生成于 2026-04-21 11:40:46 (Asia/Shanghai)

Markdown JSON

LLM15 篇

《MathNet: a Global Multimodal Benchmark for Mathematical Reasoning and Retrieval》〔评测 / 数据 / 方法〕：Mathematical problem solving remains a challenging test of reasoning for large language and multimodal models, yet existing…

MathNet: a Global Multimodal Benchmark for Mathematical Reasoning and Retrieval · Score 112
title matched "reasoning"；title matched "benchmark"；has PDF
原始来源
Benchmarking System Dynamics AI Assistants: Cloud Versus Local LLMs on CLD Extraction and Discussion · Score 108
title matched "benchmark"；summary matched "reasoning"；summary matched "evaluation"
原始来源
ClawEnvKit: Automatic Environment Generation for Claw-Like Agents · Score 107
title matched "agent"；summary matched "benchmark"；summary matched "evaluation"
原始来源
MASS-RAG: Multi-Agent Synthesis Retrieval-Augmented Generation · Score 107
title matched "agent"；summary matched "reasoning"；summary matched "benchmark"
原始来源
OneVL: One-Step Latent Reasoning and Planning with Vision-Language Explanation · Score 106
title matched "reasoning"；summary matched "agent"；summary matched "benchmark"
原始来源

2026-04-17

命中 15 篇生成于 2026-04-17 11:39:21 (Asia/Shanghai)

Markdown JSON

LLM15 篇

《CoopEval: Benchmarking Cooperation-Sustaining Mechanisms and LLM Agents in Social Dilemmas》〔评测 / 方法〕：It is increasingly important that LLM agents interact effectively and safely with other goal-pursuing agents, yet, re…

CoopEval: Benchmarking Cooperation-Sustaining Mechanisms and LLM Agents in Social Dilemmas · Score 130
title matched "agent"；title matched "benchmark"；summary matched "reasoning"
原始来源
IE as Cache: Information Extraction Enhanced Agentic Reasoning · Score 124
title matched "agent"；title matched "reasoning"；summary matched "benchmark"
原始来源
QuantCode-Bench: A Benchmark for Evaluating the Ability of Large Language Models to Generate Executable Algorithmic Trading Strategies · Score 123
title matched "benchmark"；summary matched "agent"；summary matched "alignment"
原始来源
From Reactive to Proactive: Assessing the Proactivity of Voice Agents via ProVoice-Bench · Score 122
title matched "agent"；summary matched "reasoning"；summary matched "benchmark"
原始来源
An Axiomatic Benchmark for Evaluation of Scientific Novelty Metrics · Score 109
title matched "benchmark"；title matched "evaluation"；has PDF
原始来源

2026-04-16

命中 15 篇生成于 2026-04-16 11:43:00 (Asia/Shanghai)

Markdown JSON

LLM15 篇

《GeoAgentBench: A Dynamic Execution Benchmark for Tool-Augmented Agents in Spatial Analysis》〔评测 / 应用 / 方法〕：The integration of Large Language Models (LLMs) into Geographic Information Systems (GIS) marks a paradigm shift…

GeoAgentBench: A Dynamic Execution Benchmark for Tool-Augmented Agents in Spatial Analysis · Score 162
title matched "agent"；title matched "benchmark"；summary matched "reasoning"
原始来源
HINTBench: Horizon-agent Intrinsic Non-attack Trajectory Benchmark · Score 127
title matched "agent"；title matched "benchmark"；summary matched "evaluation"
原始来源
Character Beyond Speech: Leveraging Role-Playing Evaluation in Audio Large Language Models via Reinforcement Learning · Score 120
title matched "evaluation"；summary matched "agent"；summary matched "reasoning"
原始来源
LongCoT: Benchmarking Long-Horizon Chain-of-Thought Reasoning · Score 112
title matched "reasoning"；title matched "benchmark"；has PDF
原始来源
Correct Prediction, Wrong Steps? Consensus Reasoning Knowledge Graph for Robust Chain-of-Thought Synthesis · Score 108
title matched "reasoning"；summary matched "benchmark"；summary matched "evaluation"
原始来源

2026-04-15

命中 15 篇生成于 2026-04-15 11:35:50 (Asia/Shanghai)

Markdown JSON

LLM15 篇

《Parallax: Why AI Agents That Think Must Never Act》〔评测 / 应用 / 方法〕：Autonomous AI agents are rapidly transitioning from experimental tools to operational infrastructure, with projections that 80% of enterprise application…

Parallax: Why AI Agents That Think Must Never Act · Score 107
title matched "agent"；summary matched "reasoning"；summary matched "evaluation"
原始来源
Drawing on Memory: Dual-Trace Encoding Improves Cross-Session Recall in LLM Agents · Score 107
title matched "agent"；summary matched "reasoning"；summary matched "benchmark"
原始来源
Round-Trip Translation Reveals What Frontier Multilingual Benchmarks Miss · Score 106
title matched "benchmark"；summary matched "reasoning"；summary matched "evaluation"
原始来源
Towards Long-horizon Agentic Multimodal Search · Score 106
title matched "agent"；summary matched "reasoning"；summary matched "benchmark"
原始来源
QuarkMedSearch: A Long-Horizon Deep Search Agent for Exploring Medical Intelligence · Score 105
title matched "agent"；summary matched "benchmark"；summary matched "evaluation"
原始来源

2026-04-14

命中 15 篇生成于 2026-04-14 11:37:06 (Asia/Shanghai)

Markdown JSON

LLM15 篇

《UniToolCall: Unifying Tool-Use Representation, Data, and Evaluation for LLM Agents》〔评测 / 数据 / 应用 / 方法〕：Tool-use capability is a fundamental component of LLM agents, enabling them to interact with external systems throu…

UniToolCall: Unifying Tool-Use Representation, Data, and Evaluation for LLM Agents · Score 145
title matched "agent"；title matched "evaluation"；summary matched "reasoning"
原始来源
General365: Benchmarking General Reasoning in Large Language Models Across Diverse and Challenging Tasks · Score 130
title matched "reasoning"；title matched "benchmark"；summary matched "evaluation"
原始来源
Collaborative Multi-Agent Scripts Generation for Enhancing Imperfect-Information Reasoning in Murder Mystery Games · Score 129
title matched "agent"；title matched "reasoning"；summary matched "benchmark"
原始来源
FM-Agent: Scaling Formal Methods to Large Systems via LLM-Based Hoare-Style Reasoning · Score 127
title matched "agent"；title matched "reasoning"；summary matched "evaluation"
原始来源
From Translation to Superset: Benchmark-Driven Evolution of a Production AI Agent from Rust to Python · Score 126
title matched "agent"；title matched "benchmark"；summary matched "evaluation"
原始来源

2026-04-08

命中 15 篇生成于 2026-04-08 17:10:24 (Asia/Shanghai)

Markdown JSON

LLM15 篇

收录 15 篇，重点包括《Paper Circle: An Open-source Multi-agent Research Discovery and Analysis Framework》、《Topological Characterization of Churn Flow and Unsupervised Correction to the Wu Flow-Regime Map in Small-Diameter Vertic…

LLM 固定订阅页

近期走势

相关关键词页

历史命中