最近 7 天
91
篇论文
Paper Digest Archive
汇总每天生成的 digest.json 和 digest.md,支持按 feed 过滤、按标题关键词搜索,并提供固定 feed 页面、关键词长期追踪页、持续升温视图,阅读清单、周度回顾,以及最近 7 天 / 30 天趋势页。
把站点从“按天翻”升级成“按主题长期追”。固定页更适合每天回看同一类研究信号。
Trends
把 feed 命中和关键词命中压缩成可持续浏览的长期视角。
Action Queue
把高相关未标记论文、待跟进再次出现和标星待处理论文集中成行动列表。
Notification Memory
可视化当前记住的 action reason,解释为什么某些提醒今天没有再次发出。
Momentum
把跨多天或多 feed 反复出现的论文抽出来,作为长期观察入口。
Weekly Review
把标星和待跟进论文按周收口,适合做周会前的集中回看。
Reading List
把你标星或待跟进的论文单独收口,形成长期个人研究清单。
Feed
LM 今日没有新的命中文献。
Feed
LLM 今日没有新的命中文献。
Feed
Agent Runtime Security 今日没有新的命中文献。
Feed
Vision 今日没有新的命中文献。
Feed
Terminal and SWE Agents 今日没有新的命中文献。
Feed
PubMed AI 今日没有新的命中文献。
Feed
《AI-Driven Multi-Agent System for Autonomous Mining Operation Centers》〔方法〕:International audience
这些页来自你的 feed 关键词配置。
最近 7 天
91
篇论文
最近 30 天
378
篇论文
全部归档
1016
篇论文
《NuclearQAv2: A Structured Benchmark for Evaluating Domain-Science Competence in Large Language Models》〔评测 / 数据 / 方法〕:Large language models (LLMs) have demonstrated strong performance across a wide range of tasks, but e…
《Jailbreaking for the Average Jane: Choosing Optimal Jailbreaks via Bandit Algorithms for Automatically Enhanced Queries》〔评测 / 应用 / 方法〕:With a profusion of jailbreaks for LLMs now widely known, a growing concern is that…
《Smaller Models, Unexpected Costs: Trade-offs in LLM Quantization for Automated Program Repair》〔评测 / 方法〕:Language Models (LLMs) are powerful toolsand have been increasingly adopted for complex software engineering tasks…
《InvestPhilBench: A Multi-Layer Dynamic Benchmark for Evaluating Large Language Model Procedural Reasoning in Expert Investment Philosophy》〔评测 / 应用 / 方法〕:Large language models are increasingly deployed as investment res…
《How Reliable Is Your Jailbreak Judge? Calibration and Adversarial Robustness of Automated ASR Scoring》〔方法〕:Almost every paper on LLM jailbreaks and prompt injection reports an attack-success rate (ASR), and that number…
《Unlocking Model Potentials Through Adaptive Multi-Agent Scaffolding for Efficient Issue Resolution》〔评测 / 应用 / 方法〕:Resolving issues with ambiguous and incomplete descriptions, particularly concerning complex bugs, requi…
《AGORA: An Archive-Grounded Benchmark for Agentic Workplace Document Reasoning》〔评测 / 方法〕:Large language models are increasingly deployed as agents that reason over documents rather than answer from parametric knowledge.…
《Burnyard: Future of Malware Analysis》〔方法〕:Malware analysis is a critical aspect of modern cybersecurity. The prevailing industry practice, sandboxing, involves executing suspicious binaries within isol…;《LLMs Prompted…
《SHERLOC: Structured Diagnostic Localization for Code Repair Agents》〔方法〕:LLM agents solve repository-level coding tasks through multi-turn tool use, but utilize half their budget on locating faults before editing. Dedic…
《AIR: Adaptive Interleaved Reasoning with Code in MLLMs》〔评测 / 数据 / 应用 / 方法〕:Following the paradigm shift initiated by OpenAI o3, interleaved reasoning with code to enhance multimodal large language models (MLLMs) has be…
《Capable but Careless: Do Computer-Use Agents Follow Contextual Integrity?》〔评测 / 应用 / 方法〕:Computer-use agents (CUAs) now act on a user's behalf across personal applications such as email, calendars, and to-do lists. Thi…
《Tmax: A simple recipe for terminal agents》〔评测 / 数据 / 应用 / 方法〕:Terminal-using agents have quickly become the most popular downstream application of language models (LMs). Despite their prevalence, relatively little acad…
《QMFOL: Benchmarking Large Language Model Reasoning via Quantifiable Monadic First-Order Logic Test Case Generation》〔评测 / 方法〕:Large Language Models (LLMs) have made significant progress in reasoning, particularly in ded…
《What Do Safety-Aligned LLMs Learn From Mixed Compliance Demonstrations?》〔方法〕:Prior work has shown that in-context demonstrations can jailbreak language models, but it remains unclear how models interpret different type…
《Probe-and-Refine Tuning of Repository Guidance for Coding Agents》〔应用 / 方法〕:LLM-based coding agents need higher-level operational knowledge about a repository (which files house which subsystems, how to run the test sui…
《Enhancing Decision-Making with Large Language Models through Multi-Agent Fictitious Play》〔评测 / 方法〕:Large language model (LLM)-based multi-agent systems (MAS) have demonstrated great potential in solving tasks with exec…
《CodeSentinel: A Three-Layer Defense Against Indirect Prompt Injection in Code Contexts》〔方法〕:Code large language models increasingly retrieve external code context from repositories, documentation, issue threads, and co…
《Data Intelligence Agents: Interpreting, Modeling, and Querying Enterprise Data via Autonomous Coding Agents》〔评测 / 应用 / 方法〕:Production data integration is bottlenecked by repeated, lossy handoffs between data owners, en…
《Evaluating Open-Source LLMs for Multi-Label ATT&CK Technique Classification on CTI Reports》〔评测 / 数据 / 应用 / 方法〕:Classifying Cyber Threat Intelligence (CTI) using MITRE Adversarial Tactics, Techniques, and Common Knowled…
《Seeing Is Not Screening: Multimodal Hidden Instruction Attacks on Agent Skill Scanners》〔应用 / 方法〕:Agent skills are emerging as an important attack surface in LLM-based systems. Through an empirical study of existing ski…
《All Smoke, No Alarm: Oracle Signals in Agent-Authored Test Code》〔方法〕:Software practitioners increasingly use AI coding agents that generate test code alongside production code in open source pull requests (PRs). Recent…
《OpenClaw-Skill: Collective Skill Tree Search for Agentic Large Language Models》〔评测 / 应用 / 方法〕:Equipping Large Language Model (LLM) agents with effective skills is crucial for solving complex tasks in real-world systems…
《Automated jailbreak attack targeting multiple defense strategies》〔评测 / 方法〕:Large language models (LLMs) have demonstrated remarkable capabilities across a wide range of tasks. However, their safety remains a critical c…
《Agent trajectories as programs: fingerprinting and programming coding-agent behavior》〔评测 / 数据 / 应用 / 方法〕:Benchmark scores tell you what an agent got right; they do not tell you how it got there. In this work, we introd…
《EvoArena: Tracking Memory Evolution for Robust LLM Agents in Dynamic Environments》〔评测 / 应用 / 方法〕:Large language model (LLM) agents have achieved strong performance on a wide range of benchmarks, yet most evaluations as…
《Neuro-Symbolic Agents for Regulated Process Automation: Challenges and Research Agenda》〔应用 / 方法〕:LLM-based agents are entering regulated industries where they automate judgment intensive quality management processes. W…
《Understanding the Rejection of Fixes Generated by Agentic Pull Requests -- Insights from the AIDev Dataset》〔数据 / 方法〕:AI coding agents are increasingly used to generate pull requests (PRs) that propose code fixes in sof…
《Measuring Epistemic Resilience of LLMs Under Misleading Medical Context》〔评测 / 应用 / 方法〕:Large language models (LLMs) now reach expert-level scores on medical licensing exams, encouraging the assumption that high scores…
《Grammar-Constrained Decoding Can Jailbreak LLMs into Generating Malicious Code》〔评测 / 方法〕:Large Language Models (LLMs) are increasingly used for code generation, raising concerns that they may be misused to produce mali…
《PROJECTMEM: A Local-First, Event-Sourced Memory and Judgment Layer for AI Coding Agents》〔应用 / 方法〕:AI coding assistants now support a growing share of software work, from quick scripts to production applications. Yet th…
《T1-Bench: Benchmarking Multi-Scenario Agents in Real-World Domains》〔评测 / 应用 / 方法〕:Recent advances in reasoning and tool-calling capabilities of large language models (LLMs) have enabled increasingly capable agentic sys…
《Toward Secure LLM Agents: Threat Surfaces, Attacks, Defenses, and Evaluation》〔评测 / 应用 / 方法〕:Large language model (LLM) agents are rapidly moving from conversational interfaces to software components that plan, invoke t…
《Frontier Coding Agents Use Metaprogramming to Adapt to Unfamiliar Programming Languages》〔评测 / 方法〕:LLM-based coding agents are usually evaluated in familiar software settings: mainstream languages, common libraries, and…
《SpatialWorld: Benchmarking Interactive Spatial Reasoning of Multimodal Agents in Real-World Tasks》〔评测 / 方法〕:Spatial reasoning is a foundational capability for multimodal large language models (MLLMs) to perceive and op…
《WeaveBench: A Long-Horizon, Real-World Benchmark for Computer-Use Agents with Hybrid Interfaces》〔评测 / 方法〕:Computer-use agents (CUAs) increasingly operate in runtimes that combine visual desktop control, command-line ex…
《SIGA: Self-Evolving Coding-Agent Adapters for Scientific Simulation》〔方法〕:Advanced scientific simulators expose specialized input languages that turn simulation goals into executable configurations, but learning them ca…
《MCBench: A Multicontext Safety Assessment Benchmark for Omni Large Language Models》〔评测 / 方法〕:Existing multimodal safety benchmarks focus solely on visual inputs and cannot assess Omni Large Language Models (LLMs) that…
《GuardNet: Ensemble Strategies of Shallow Neural Networks for Robust Prompt Injection and Jailbreak Detection》〔评测 / 数据 / 应用 / 方法〕:Large Language Models (LLMs) have transformed natural language processing, but they remai…
《ADK Arena: Evaluating Agent Development Kits via LLM-as-a-Developer》〔评测 / 方法〕:The rapid proliferation of Agent Development Kits (ADKs), SDK-level frameworks for building LLM-powered autonomous agents, has outpaced any…
《A Systematic Evaluation of Positional Bias in Multi-Video Summarization with MLLMs》〔评测 / 应用 / 方法〕:Multimodal Large Language Models (MLLMs) are increasingly used for video understanding, yet their reliability under mult…
《MaskForge: Structure-Aware Adaptive Attacks for Jailbreaking Diffusion Large Language Models》〔评测 / 应用 / 方法〕:Diffusion large language models (dLLMs) generate text by iteratively denoising partially masked sequences unde…
《Latent Anchor-Driven Test Generation for Deep Neural Networks》〔数据 / 应用 / 方法〕:Deep Neural Networks (DNNs) are increasingly being deployed in security-critical and safety-sensitive applications, which makes rigorous test…
《Agentic Chain-of-Thought Steering for Efficient and Controllable LLM Reasoning》〔评测 / 方法〕:Large language models improve final-answer accuracy through extended chain-of-thought reasoning, but often spend tokens inefficie…
《D-Judge: Disrupting Multi-Turn Jailbreaks using Semantics-Preserving Output Rewriting》〔评测 / 数据 / 方法〕:Multi-turn jailbreak attacks pose a growing threat to large language model (LLM) safety because they exploit feedback…
《What Makes Interaction Trajectories Effective for Training Terminal Agents?》〔方法〕:Stronger code agents are commonly assumed to be superior teachers for post-training, yet this assumption remains poorly disentangled from…
《POIROT: Interrogating Agents for Failure Detection in Multi-Agent Systems》〔评测 / 应用 / 方法〕:Orchestrating Large Language Models into Multi-Agent Systems (LLM-MAS) has unlocked remarkable reasoning capabilities, yet emerge…
《Jailbreaking Multimodal Large Language Models using Multi-Clip Video》〔数据 / 应用 / 方法〕:As multimodal large language models (MLLMs) have advanced to process video inputs, concerns have emerged about their potential for mal…
《SkillHarm: Lifecycle-Aware Skill-Based Attacks via Automated Construction》〔评测 / 应用 / 方法〕:Agent skills occupy a privileged position in the agent workflow, as agents are expected to implicitly follow and execute them, re…
《FinBoardBench: Benchmarking Dynamic Wealth Management and Strategic Financial Reasoning of LLMs via Board Game Simulations》〔评测 / 方法〕:Recently, large language models (LLMs) have achieved superior performance in static f…
《Provably Secure Agent Guardrail》〔评测 / 应用 / 方法〕:As large language models transition from bounded generative engines to agents with expansive execution privileges, AI going out of control precipitates a funda…;《Robust an…
《Physics Is All You Need? A Case Study in Physicist-Supervised AI Development of Scientific Software》〔应用 / 方法〕:Are AI agents tools, co-authors, or researchers? We present a quantified case study ($N=1$): a physicist sup…
《MemTrace: Tracing and Attributing Errors in Large Language Model Memory Systems》〔评测 / 方法〕:Memory is essential for enabling large language models to support long-horizon reasoning, yet existing memory systems remain unr…
《Learn from Weaknesses: Automated Domain Specialization for Small Computer-Use Agents》〔数据 / 方法〕:Computer-use agents (CUAs) have recently made substantial progress, but deploying a separate large expert for each software…
《Calibrating Conservatism for Scalable Oversight》〔方法〕:Agentic AI systems capable of autonomous planning and extended environmental interaction pose a fundamental control problem: how can humans maintain meaningful…
《Traceable Knowledge Graph Reasoning Enables LLM-Assisted Decision Support for Industrial VOCs in the Steel Industry》〔评测 / 数据 / 应用 / 方法〕:Key knowledge for steel-industry volatile organic compounds (VOCs) governance is s…
《EviACT: An Evidence-to-Action Framework for Agentic Program Repair》〔评测 / 方法〕:LLM-based agents have moved automated program repair (APR) from fixed-context patch generation to interactive repository-level repair. Howeve…
Terminal and SWE Agents 今日没有新的命中文献。
《Automated Benchmark Auditing for AI Agents and Large Language Models》〔评测 / 数据 / 方法〕:Modern AI benchmarks operate at a complexity that outpaces traditional verification methods. Tasks authored by domain experts often co…
《CUA-Gym: Scaling Verifiable Training Environments and Tasks for Computer-Use Agents》〔评测 / 数据 / 应用 / 方法〕:Reinforcement learning with verifiable rewards (RLVR) has driven breakthroughs in domains such as math, tool-use,…
Terminal and SWE Agents 今日没有新的命中文献。
《Agentic CLEAR: Automating Multi-Level Evaluation of LLM Agents》〔评测 / 方法〕:Agentic systems are becoming more capable: agents define strategies, take actions, and interact with different environments. This autonomy poses…
《DeltaBox: Scaling Stateful AI Agents with Millisecond-Level Sandbox Checkpoint/Rollback》〔评测 / 应用 / 方法〕:LLM-powered AI agents require high-frequency state exploration (e.g., test-time tree search and reinforcement learn…
《"Refactoring Runaway": Understanding and Mitigating Tangled Refactorings in Coding Agents for Issue Resolution》〔方法〕:Recent advances in coding agents have shown remarkable progress in software issue resolution. In pract…
《Tracing the ongoing emergence of human-like reasoning in Large Language Models》〔方法〕:Humans effortlessly go beyond literal meanings: If you mow the lawn, I will give you fifty dollars, is typically understood as implyin…
《Agent JIT Compilation for Latency-Optimizing Web Agent Planning and Scheduling》〔应用 / 方法〕:Computer-use agents (CUA) automate tasks specified with natural language such as "order the cheapest item from Taco Bell" by gene…
《SpecBench: Measuring Reward Hacking in Long-Horizon Coding Agents》〔评测 / 方法〕:As long-horizon coding agents produce more code than any developer can review, oversight collapses onto a single surface: the automated test s…
《MixRea: Benchmarking Explicit-Implicit Reasoning in Large Language Models》〔评测 / 方法〕:Large language models (LLMs) are increasingly integrated into high-stakes decision-making. Inspired by the theory of \emph{inattention…
《Attention-Guided Reward for Reinforcement Learning-based Jailbreak against Large Reasoning Models》〔评测 / 方法〕:Large Reasoning Models (LRMs) have demonstrated remarkable capabilities in solving complex problems by generat…
《Does Code Cleanliness Affect Coding Agents? A Controlled Minimal-Pair Study》〔评测 / 应用 / 方法〕:As autonomous coding agents see rapid adoption, their evaluation has primarily focused on task completion rates holding the tar…
《CrossView Suite: Harnessing Cross-view Spatial Intelligence of MLLMs with Dataset, Model and Benchmark》〔评测 / 数据 / 方法〕:Spatial intelligence requires multimodal large language models (MLLMs) to move beyond single-view pe…
《An Empirical Study of Privacy Leakage Chains via Prompt Injection in Black-Box Chatbot Environments》〔方法〕:LLM-based chatbot agents increasingly process user requests by combining natural-language reasoning with external…
《Same Signal, Different Semantics: A Cross-Framework Behavioral Analysis of Software Engineering Agents》〔应用 / 方法〕:Behavioral studies of LLM-based software engineering agents extract operational rules about which traject…
《CryptoBench: A Dynamic Benchmark for Expert-Level Evaluation of LLM Agents in Cryptocurrency》〔评测 / 应用 / 方法〕:This paper introduces CryptoBench, the first expert-curated, dynamic benchmark designed to rigorously evaluate…
Agent Runtime Security 今日没有新的命中文献。
Terminal and SWE Agents 今日没有新的命中文献。
《Talk is (Not) Cheap: A Taxonomy and Benchmark Coverage Audit for LLM Attacks》〔评测 / 数据 / 方法〕:We introduce a reusable framework for auditing whether LLM attack benchmarks collectively cover the threat surface: a 4$\times…
Agent Runtime Security 今日没有新的命中文献。
《CRANE: Constrained Reasoning Injection for Code Agents via Nullspace Editing》〔评测 / 方法〕:Code agents must both reason over long-horizon repository state and obey strict tool-use protocols. In paired Instruct/Thinking che…
《RealICU: Do LLM Agents Understand Long-Context ICU Data? A Benchmark Beyond Behavior Imitation》〔评测 / 数据 / 方法〕:Intensive care units (ICU) generate long, dense and evolving streams of clinical information, where physicia…
《Sleeper Channels and Provenance Gates: Persistent Prompt Injection in Always-on Autonomous AI Agents》〔评测 / 应用 / 方法〕:Always-on AI agents (OpenClaw, Hermes Agent) run as a single persistent process under the owner's iden…
《MedHopQA: A Disease-Centered Multi-Hop Reasoning Benchmark and Evaluation Framework for LLM-Based Biomedical Question Answering》〔评测 / 数据 / 应用 / 方法〕:Evaluating large language models (LLMs) in the biomedical domain requi…
《Metaphor Is Not All Attention Needs》〔应用 / 方法〕:Large language models are increasingly deployed in safety-critical applications, where their ability to resist harmful instructions is essential. Although post…;《A microser…
《WildClawBench: A Benchmark for Real-World, Long-Horizon Agent Evaluation》〔评测 / 方法〕:Large language and vision-language models increasingly power agents that act on a user's behalf through command-line interface (CLI) ha…
《Break the Brake, Not the Wheel: Untargeted Jailbreak via Entropy Maximization》〔评测 / 方法〕:Recent studies show that gradient-based universal image jailbreaks on vision-language models (VLMs) exhibit little or no cross-mod…
LM 今日没有新的命中文献。
Agent Runtime Security 今日没有新的命中文献。
LM 今日没有新的命中文献。
Agent Runtime Security 今日没有新的命中文献。
LM 今日没有新的命中文献。
Agent Runtime Security 今日没有新的命中文献。
《LatentRAG: Latent Reasoning and Retrieval for Efficient Agentic RAG》〔评测 / 数据 / 应用 / 方法〕:Single-step retrieval-augmented generation (RAG) provides an efficient way to incorporate external information for simple question…
《Constraining Host-Level Abuse in Self-Hosted Computer-Use Agents via TEE-Backed Isolation》〔评测 / 应用 / 方法〕:Self-hosted computer-use agents (SHCUAs), such as OpenClaw, combine natural-language interaction with direct acce…
《Misaligned by Reward: Socially Undesirable Preferences in LLMs》〔评测 / 数据 / 方法〕:Reward models are a key component of large language model alignment, serving as proxies for human preferences during training. However, exis…
《Safety and accuracy follow different scaling laws in clinical large language models》〔评测 / 应用 / 方法〕:Clinical LLMs are often scaled by increasing model size, context length, retrieval complexity, or inference-time comput…
《StressEval: Failure-Driven Dynamic Benchmarking for Knowledge-Intensive Reasoning in Large Language Models》〔评测 / 数据 / 方法〕:Static benchmarks for LLMs are increasingly compromised by contamination and overfitting especia…
LM 今日没有新的命中文献。
LM 今日没有新的命中文献。
LM 今日没有新的命中文献。
《Collaborative Agent Reasoning Engineering (CARE): A Three-Party Design Methodology for Systematically Engineering AI Agents with Subject Matter Experts, Developers, and Helper Agents》〔评测 / 应用 / 方法〕:We present Collabora…
LM 今日没有新的命中文献。
《LLM-ReSum: A Framework for LLM Reflective Summarization through Self-Evaluation》〔评测 / 数据 / 方法〕:Reliable evaluation of large language model (LLM)-generated summaries remains an open challenge, particularly across hetero…
LM 今日没有新的命中文献。
LLM 今日没有新的命中文献。
Vision 今日没有新的命中文献。
PubMed AI 今日没有新的命中文献。
《AI-Driven Multi-Agent System for Autonomous Mining Operation Centers》〔方法〕:International audience
LLM 今日没有新的命中文献。
Vision 今日没有新的命中文献。
PubMed AI 今日没有新的命中文献。
《SOS::LM Sequence Initializer: Semantic Process Architecture for Controlled, Traceable, and Structured Language Model Outputs》〔评测 / 应用 / 方法〕:SOS::LM (Schloemer-Notation ::) defines a semantic process architecture for la…
LLM 今日没有新的命中文献。
Vision 今日没有新的命中文献。
《Establishing Clinically Significant Change Benchmarks for the Moral Injury Outcome Scale in VA Behavioral Health Settings.》〔评测 / 方法〕:This study aimed to establish benchmarks for clinically significant change for the Mo…
OpenAlex AI 今日没有新的命中文献。
《Tool Attention Is All You Need: Dynamic Tool Gating and Lazy Schema Loading for Eliminating the MCP/Tools Tax in Scalable Agentic Workflows》〔评测 / 应用 / 方法〕:The Model Context Protocol (MCP) has become a common interface…
《Pre-process for segmentation task with nonlinear diffusion filters》〔方法〕:This paper deals with the case of using nonlinear diffusion filters to obtain piecewise constant images as a previous process for segmentation tec…
《Biomed-DPT: Dual Modality Prompt Tuning for Biomedical Vision-Language Models.》〔数据 / 应用 / 方法〕:Prompt learning has emerged as one of the most effective paradigms for adapting pre-trained vision language models (VLMs) to…
OpenAlex AI 今日没有新的命中文献。
《OMIBench: Benchmarking Olympiad-Level Multi-Image Reasoning in Large Vision-Language Model》〔评测 / 方法〕:Large vision-language models (LVLMs) have made substantial advances in reasoning tasks at the Olympiad level. Neverth…
《LLaDA2.0-Uni: Unifying Multimodal Understanding and Generation with Diffusion Large Language Model》〔方法〕:We present LLaDA2.0-Uni, a unified discrete diffusion large language model (dLLM) that supports multimodal underst…
《Comparative evaluation of large language models for generating CAD-RADS 2.0-compliant diagnostic conclusions in cardiac CT reports.》〔评测 / 应用 / 方法〕:OBJECTIVES: Coronary computed tomography angiography (CCTA) has become…
OpenAlex AI 今日没有新的命中文献。
《Four-Axis Decision Alignment for Long-Horizon Enterprise AI Agents》〔评测 / 应用 / 方法〕:Long-horizon enterprise agents make high-stakes decisions (loan underwriting, claims adjudication, clinical review, prior authorization)…
《PanDA: Unsupervised Domain Adaptation for Multimodal 3D Panoptic Segmentation in Autonomous Driving》〔评测 / 方法〕:This paper presents the first study on Unsupervised Domain Adaptation (UDA) for multimodal 3D panoptic segme…
《Classifying American Society of Anesthesiologists Physical Status With a Low-Rank-Adapted Large Language Model: Development and Validation Study.》〔评测 / 应用 / 方法〕:BACKGROUND: The American Society of Anesthesiologists Phy…
OpenAlex AI 今日没有新的命中文献。
《MathNet: a Global Multimodal Benchmark for Mathematical Reasoning and Retrieval》〔评测 / 数据 / 方法〕:Mathematical problem solving remains a challenging test of reasoning for large language and multimodal models, yet existing…
《AdaCluster: Adaptive Query-Key Clustering for Sparse Attention in Video Generation》〔应用 / 方法〕:Video diffusion transformers (DiTs) suffer from prohibitive inference latency due to quadratic attention complexity. Existing…
《Transforming oncology clinical trial matching through neuro-symbolic, multi-agent AI and an oncology-specific knowledge graph: a prospective evaluation in 3804 patients.》〔评测 / 数据 / 应用 / 方法〕:BACKGROUND: Clinical trial e…
OpenAlex AI 今日没有新的命中文献。
LLM 今日没有新的命中文献。
Vision 今日没有新的命中文献。
《Medic Training at Military-Civilian Partnerships-A Narrative Review.》〔评测 / 应用 / 方法〕:INTRODUCTION: Military-Civilian Partnerships (MCP) were developed to mitigate degradation of combat medical readiness during peacetime…
《Artificial Intelligence And The Transformation of Labor Markets》〔方法〕:The rapid advancement of artificial intelligence (AI) technologies, particularly generative AI and large language models, has reignited debates about…
LLM 今日没有新的命中文献。
Vision 今日没有新的命中文献。
PubMed AI 今日没有新的命中文献。
OpenAlex AI 今日没有新的命中文献。
LLM 今日没有新的命中文献。
Vision 今日没有新的命中文献。
《Pretraining effective T5 generative models for clinical and biomedical applications.》〔评测 / 数据 / 应用 / 方法〕:This paper presents a study of the impact of corpus selection and vocabulary design on the performance of T5-base…
OpenAlex AI 今日没有新的命中文献。
《CoopEval: Benchmarking Cooperation-Sustaining Mechanisms and LLM Agents in Social Dilemmas》〔评测 / 方法〕:It is increasingly important that LLM agents interact effectively and safely with other goal-pursuing agents, yet, re…
《SegWithU: Uncertainty as Perturbation Energy for Single-Forward-Pass Risk-Aware Medical Image Segmentation》〔应用 / 方法〕:Reliable uncertainty estimation is critical for medical image segmentation, where automated contours…
《Applying natural language processing and large language models to clinical notes for phenotyping and diagnosing rare diseases: a systematic review.》〔评测 / 数据 / 应用 / 方法〕:OBJECTIVES: Patients with rare diseases often face…
OpenAlex AI 今日没有新的命中文献。
《GeoAgentBench: A Dynamic Execution Benchmark for Tool-Augmented Agents in Spatial Analysis》〔评测 / 应用 / 方法〕:The integration of Large Language Models (LLMs) into Geographic Information Systems (GIS) marks a paradigm shift…
《ROSE: Retrieval-Oriented Segmentation Enhancement》〔评测 / 方法〕:Existing segmentation models based on multimodal large language models (MLLMs), such as LISA, often struggle with novel or emerging entities due to their inab…
《Augmenting Large Language Model With Prompt Engineering and Supervised Fine-Tuning in Non-Small Cell Lung Cancer Tumor-Node-Metastasis Staging: Framework Development and Validation.》〔评测 / 数据 / 应用 / 方法〕:BACKGROUND: Accu…
OpenAlex AI 今日没有新的命中文献。
《Parallax: Why AI Agents That Think Must Never Act》〔评测 / 应用 / 方法〕:Autonomous AI agents are rapidly transitioning from experimental tools to operational infrastructure, with projections that 80% of enterprise application…
《RSGMamba: Reliability-Aware Self-Gated State Space Model for Multimodal Semantic Segmentation》〔评测 / 方法〕:Multimodal semantic segmentation has emerged as a powerful paradigm for enhancing scene understanding by leveragin…
《VLBiasBench: A Comprehensive Benchmark for Evaluating Bias in Large Vision-Language Model.》〔评测 / 数据 / 方法〕:The emergence of Large Vision-Language Models (LVLMs) marks significant strides towards achieving general artifi…
《Demystifying Attitudes and Effects of Usage of Large-Language Models Among College-Aged Students》〔方法〕:In compiling literature for my senior seminar on combating hallucinations present within responses from large-langua…
《UniToolCall: Unifying Tool-Use Representation, Data, and Evaluation for LLM Agents》〔评测 / 数据 / 应用 / 方法〕:Tool-use capability is a fundamental component of LLM agents, enabling them to interact with external systems throu…
《OmniShow: Unifying Multimodal Conditions for Human-Object Interaction Video Generation》〔评测 / 数据 / 应用 / 方法〕:In this work, we study Human-Object Interaction Video Generation (HOIVG), which aims to synthesize high-quality…
《Comparing Large Language Models and Traditional Machine Translation Tools for Translating Medical Consultation Summaries: Quantitative Pilot Feasibility Study.》〔评测 / 应用 / 方法〕:BACKGROUND: Translation of medical consulta…
《ECO-Charge: Multi-Agent Smart-Charging for Electric Vehicles》〔方法〕:International audience
LLM 今日没有新的命中文献。
Vision 今日没有新的命中文献。
PubMed AI 今日没有新的命中文献。
OpenAlex AI 今日没有新的命中文献。
LLM 今日没有新的命中文献。
Vision 今日没有新的命中文献。
《Combining structural modeling and deep learning to calculate the E. coli protein interactome and functional networks.》〔数据 / 方法〕:We report on the integration of three methods that predict, on a proteome-wide scale, whet…
OpenAlex AI 今日没有新的命中文献。
LLM 今日没有新的命中文献。
Vision 今日没有新的命中文献。
《Factors influencing large language model adoption among dental students: a cross-sectional study.》〔应用 / 方法〕:This research evaluates the factors influencing the behavioural intention (BI) to adopt large language models…
《Coalition Drift: When Agents Drift Together Why multi-agent systems don't just drift individually — they drift as a group, and why that matters more than any single-agent failure mode.》〔方法〕:Most AI governance framework…
LLM 今日没有新的命中文献。
Vision 今日没有新的命中文献。
PubMed AI 今日没有新的命中文献。
OpenAlex AI 今日没有新的命中文献。
LLM 今日没有新的命中文献。
Vision 今日没有新的命中文献。
《Subcategory vs category fluency: Items and networks in healthy young adults and simulation with a large language model.》〔评测 / 应用 / 方法〕:Category fluency tasks involve producing words constrained by a semantic field (ani…
收录 15 篇,重点包括《Paper Circle: An Open-source Multi-agent Research Discovery and Analysis Framework》、《Topological Characterization of Churn Flow and Unsupervised Correction to the Wu Flow-Regime Map in Small-Diameter Vertic…
收录 10 篇,重点包括《Action Images: End-to-End Policy Learning via Multiview Video Generation》、《DiffHDR: Re-Exposing LDR Videos with Video Diffusion Models》。