优先处理
2
篇已过期论文
Action Queue
这里汇总下一步最值得处理的论文,帮助你把每日研究情报转成明确的跟进动作。
优先处理
2
篇已过期论文
下一步已设
0
篇待推进论文
阅读推进
0
篇阅读中
优先看已过期、3 天内到期,以及已经写下下一步动作但还没推进的论文。
已经超过计划处理日期的论文,应该优先清掉积压。
Review Queue
标星The rapid growth of scientific literature has made it increasingly difficult for researchers to efficiently discover, evaluate, and synthesize relevant work. Recent advances in multi-agent large language models (LLMs) h…
备注:Anchor paper for the multi-agent discovery workflow; compare its planner design with newer agent benchmarks.
下一步:compare planner design with newer agent benchmarks
最晚处理:2026-04-18
Review Queue
待跟进Large Language Models (LLMs) are increasingly deployed in medicine. However, their utility for non-generative clinical prediction is under-evaluated, and they are often assumed to be inferior to specialized models, crea…
备注:Recheck whether ClinicRealm still beats classical clinical baselines under the same task framing.
下一步:recheck benchmark framing against classical baselines
最晚处理:2026-04-20
复查周期:每 14 天
最近 7 天出现、相关性高但还没进入个人反馈状态的论文。
Review Queue
Long-horizon enterprise agents make high-stakes decisions (loan underwriting, claims adjudication, clinical review, prior authorization) under lossy memory, multi-step reasoning, and binding regulatory constraints. Curr…
Review Queue
The integration of Large Language Models (LLMs) into Geographic Information Systems (GIS) marks a paradigm shift toward autonomous spatial analysis. However, evaluating these LLM-based agents remains challenging due to…
Review Queue
We introduce the Cyber Defense Benchmark, a benchmark for measuring how well large language model (LLM) agents perform the core SOC analyst task of threat hunting: given a database of raw Windows event logs with no guid…
Review Queue
Large Language Model agents have rapidly evolved from static text generators into dynamic systems capable of executing complex autonomous workflows. To enhance reliability, multi-agent frameworks assigning specialized r…
Review Queue
Large Language Models (LLMs) still struggle with multi-step logical reasoning. Existing approaches either purely refine the reasoning chain in natural language form or attach a symbolic solver as an external module. In…
Review Queue
It is increasingly important that LLM agents interact effectively and safely with other goal-pursuing agents, yet, recent works report the opposite trend: LLMs with stronger reasoning capabilities behave _less_ cooperat…
Review Queue
Social deduction games such as Mafia present a unique AI challenge: players must reason under uncertainty, interpret incomplete and intentionally misleading information, evaluate human-like communication, and make strat…
Review Queue
Existing agent-safety evaluation has focused mainly on externally induced risks. Yet agents may still enter unsafe trajectories under benign conditions. We study this complementary but underexplored setting through the…
Review Queue
The rapid adoption of Large Language Models (LLMs) has spurred interest in automated peer review; however, progress is currently stifled by benchmarks that treat reviewing primarily as a rating prediction task. We argue…
Review Queue
Understanding artworks requires multi-step reasoning over visual content and cultural, historical, and stylistic context. While recent multimodal large language models show promise in artwork explanation, they rely on i…
Review Queue
Information Extraction aims to distill structured, decision-relevant information from unstructured text, serving as a foundation for downstream understanding and reasoning. However, it is traditionally treated merely as…
Review Queue
Reinforcement learning (RL) as post-training is crucial for enhancing the reasoning ability of large language models (LLMs) in coding and math. However, their capacity for visual semantic arithmetic, inferring relations…