优先处理
2
篇已过期论文
Action Queue
这里汇总下一步最值得处理的论文,帮助你把每日研究情报转成明确的跟进动作。
优先处理
2
篇已过期论文
下一步已设
0
篇待推进论文
阅读推进
0
篇阅读中
优先看已过期、3 天内到期,以及已经写下下一步动作但还没推进的论文。
已经超过计划处理日期的论文,应该优先清掉积压。
Review Queue
标星The rapid growth of scientific literature has made it increasingly difficult for researchers to efficiently discover, evaluate, and synthesize relevant work. Recent advances in multi-agent large language models (LLMs) h…
备注:Anchor paper for the multi-agent discovery workflow; compare its planner design with newer agent benchmarks.
下一步:compare planner design with newer agent benchmarks
最晚处理:2026-04-18
Review Queue
待跟进Large Language Models (LLMs) are increasingly deployed in medicine. However, their utility for non-generative clinical prediction is under-evaluated, and they are often assumed to be inferior to specialized models, crea…
备注:Recheck whether ClinicRealm still beats classical clinical baselines under the same task framing.
下一步:recheck benchmark framing against classical baselines
最晚处理:2026-04-20
复查周期:每 14 天
最近 7 天出现、相关性高但还没进入个人反馈状态的论文。
Review Queue
Large Language Models (LLMs) have made significant progress in reasoning, particularly in deductive reasoning, which is crucial for high-stakes decision-making. As models improve, evaluation benchmarks should evolve to…
Review Queue
Large language models (LLMs) have demonstrated strong performance across a wide range of tasks, but ensuring their reliability in highly technical domains remains a significant challenge. In nuclear engineering, problem…
Review Queue
Large language model (LLM) agents are increasingly proposed as supervisory components for safety-critical systems, yet their robustness under sustained, adaptive adversarial pressure remains poorly characterized. We pre…
Review Queue
Following the paradigm shift initiated by OpenAI o3, interleaved reasoning with code to enhance multimodal large language models (MLLMs) has become a pivotal research frontier. The existing literature focuses primarily…
Review Queue
Large language models are increasingly deployed as agents that reason over documents rather than answer from parametric knowledge. We study archive-grounded reasoning: locating sparse evidence across a large, messy coll…
Review Queue
While Large Language Models (LLMs) are increasingly deployed in long interactions, existing evaluations focus predominantly on retrospective memory (RM) via explicit queries. Prospective memory (PM), the critical abilit…
Review Queue
The explosive growth and complexity of product data within the dynamic Brazilian e-commerce landscape demand robust and specialized methods for structured information extraction. Traditional approaches to Product Attrib…
Review Queue
Large language models (LLMs) have achieved strong performance across a wide range of language-based tasks by leveraging both extensive parametric knowledge and in-context learning ability, enabling them to incorporate e…
Review Queue
Large language models are increasingly deployed as investment research assistants, yet no benchmark tests whether they can accurately reconstruct and apply the specific procedural decision frameworks of expert investors…
Review Queue
Vision-language models (VLMs) have achieved strong performance on OCR-based benchmarks and increasingly focused on text-rich understanding, but their robustness under controlled visual degradation remains insufficiently…
Review Queue
Standard benchmarks for multimodal large language models (MLLMs) score each item on one canonical ordering and miss whether order-irrelevant shuffling changes the answer, a baseline reliability property called for by em…
Review Queue
Rare diseases affect millions of individuals worldwide, yet timely diagnosis remains a major public health challenge due to scarcity of specialized clinical expertise. While large language models (LLMs) show promise to…