Action Queue

Review Queue

这里汇总下一步最值得处理的论文,帮助你把每日研究情报转成明确的跟进动作。

优先处理

2

篇已过期论文

0 篇 3 天内到期

下一步已设

0

篇待推进论文

2 篇带行动计划

阅读推进

0

篇阅读中

0 篇已完成

行动队列

优先看已过期、3 天内到期,以及已经写下下一步动作但还没推进的论文。

新出现且未标记

最近 7 天出现、相关性高但还没进入个人反馈状态的论文。

Review Queue

Four-Axis Decision Alignment for Long-Horizon Enterprise AI Agents

Long-horizon enterprise agents make high-stakes decisions (loan underwriting, claims adjudication, clinical review, prior authorization) under lossy memory, multi-step reasoning, and binding regulatory constraints. Curr…

1 天1 个 feed1 次命中
首次出现:2026-04-22 11:37:03 (UTC+08:00)最近出现:2026-04-22 11:37:03 (UTC+08:00)

Review Queue

GeoAgentBench: A Dynamic Execution Benchmark for Tool-Augmented Agents in Spatial Analysis

The integration of Large Language Models (LLMs) into Geographic Information Systems (GIS) marks a paradigm shift toward autonomous spatial analysis. However, evaluating these LLM-based agents remains challenging due to…

1 天1 个 feed1 次命中
首次出现:2026-04-16 11:43:00 (UTC+08:00)最近出现:2026-04-16 11:43:00 (UTC+08:00)

Review Queue

Cyber Defense Benchmark: Agentic Threat Hunting Evaluation for LLMs in SecOps

We introduce the Cyber Defense Benchmark, a benchmark for measuring how well large language model (LLM) agents perform the core SOC analyst task of threat hunting: given a database of raw Windows event logs with no guid…

1 天1 个 feed1 次命中
首次出现:2026-04-22 11:37:03 (UTC+08:00)最近出现:2026-04-22 11:37:03 (UTC+08:00)

Review Queue

Taming Actor-Observer Asymmetry in Agents via Dialectical Alignment

Large Language Model agents have rapidly evolved from static text generators into dynamic systems capable of executing complex autonomous workflows. To enhance reliability, multi-agent frameworks assigning specialized r…

1 天1 个 feed1 次命中
首次出现:2026-04-22 11:37:03 (UTC+08:00)最近出现:2026-04-22 11:37:03 (UTC+08:00)

Review Queue

Discovering a Shared Logical Subspace: Steering LLM Logical Reasoning via Alignment of Natural-Language and Symbolic Views

Large Language Models (LLMs) still struggle with multi-step logical reasoning. Existing approaches either purely refine the reasoning chain in natural language form or attach a symbolic solver as an external module. In…

1 天1 个 feed1 次命中
首次出现:2026-04-22 11:37:03 (UTC+08:00)最近出现:2026-04-22 11:37:03 (UTC+08:00)

Review Queue

CoopEval: Benchmarking Cooperation-Sustaining Mechanisms and LLM Agents in Social Dilemmas

It is increasingly important that LLM agents interact effectively and safely with other goal-pursuing agents, yet, recent works report the opposite trend: LLMs with stronger reasoning capabilities behave _less_ cooperat…

1 天1 个 feed1 次命中
首次出现:2026-04-17 11:39:21 (UTC+08:00)最近出现:2026-04-17 11:39:21 (UTC+08:00)

Review Queue

Revac: A Social Deduction Reasoning Agent

Social deduction games such as Mafia present a unique AI challenge: players must reason under uncertainty, interpret incomplete and intentionally misleading information, evaluate human-like communication, and make strat…

1 天1 个 feed1 次命中
首次出现:2026-04-22 11:37:03 (UTC+08:00)最近出现:2026-04-22 11:37:03 (UTC+08:00)

Review Queue

HINTBench: Horizon-agent Intrinsic Non-attack Trajectory Benchmark

Existing agent-safety evaluation has focused mainly on externally induced risks. Yet agents may still enter unsafe trajectories under benign conditions. We study this complementary but underexplored setting through the…

1 天1 个 feed1 次命中
首次出现:2026-04-16 11:43:00 (UTC+08:00)最近出现:2026-04-16 11:43:00 (UTC+08:00)

Review Queue

Beyond Rating: A Comprehensive Evaluation and Benchmark for AI Reviews

The rapid adoption of Large Language Models (LLMs) has spurred interest in automated peer review; however, progress is currently stifled by benchmarks that treat reviewing primarily as a rating prediction task. We argue…

1 天1 个 feed1 次命中
首次出现:2026-04-22 11:37:03 (UTC+08:00)最近出现:2026-04-22 11:37:03 (UTC+08:00)

Review Queue

A-MAR: Agent-based Multimodal Art Retrieval for Fine-Grained Artwork Understanding

Understanding artworks requires multi-step reasoning over visual content and cultural, historical, and stylistic context. While recent multimodal large language models show promise in artwork explanation, they rely on i…

1 天1 个 feed1 次命中
首次出现:2026-04-22 11:37:03 (UTC+08:00)最近出现:2026-04-22 11:37:03 (UTC+08:00)

Review Queue

IE as Cache: Information Extraction Enhanced Agentic Reasoning

Information Extraction aims to distill structured, decision-relevant information from unstructured text, serving as a foundation for downstream understanding and reasoning. However, it is traditionally treated merely as…

1 天1 个 feed1 次命中
首次出现:2026-04-17 11:39:21 (UTC+08:00)最近出现:2026-04-17 11:39:21 (UTC+08:00)

Review Queue

Multi-modal Reasoning with LLMs for Visual Semantic Arithmetic

Reinforcement learning (RL) as post-training is crucial for enhancing the reasoning ability of large language models (LLMs) in coding and math. However, their capacity for visual semantic arithmetic, inferring relations…

1 天1 个 feed1 次命中
首次出现:2026-04-22 11:37:03 (UTC+08:00)最近出现:2026-04-22 11:37:03 (UTC+08:00)