Keyword Tracking

关键词追踪：large language model

这个页面会长期追踪你配置里关心的关键词，并把命中的论文按日期沉淀下来。

返回归档首页查看趋势总览最新 JSON 订阅 RSS

近期走势

最近一次命中来自 LM：NuclearQAv2: A Structured Benchmark for Evaluating Domain-Science Competence in Large Language Models

2026-06-15

2026-06-16

2026-06-17

2026-06-18

2026-06-19

2026-06-20

2026-06-21

2026-06-22

2026-06-23

2026-06-24

2026-06-25

2026-06-26

2026-06-27

2026-06-28

命中明细

按日期回看匹配到这个关键词的论文标题，并保留来源 feed 信息。

查看原始来源

Many decoding methods for large language models can be understood as shifting probability mass toward outputs that are more likely under the model, either locally at the token lev…

查看原始来源

Increasing demand for precise and reliable control in complex scenarios has led to the development of increasingly sophisticated controllers, including data-driven approaches empl…

MiniOpt: Reasoning to Model and Solve General Optimization Problems with Limited Resources

查看原始来源

Achieving strong optimization generalization across diverse optimization problems while requiring limited training resources remains a challenging problem for optimization-oriente…

Terminal and SWE Agents

Evaluating LLMs on Real-World Software Performance Optimization

查看原始来源

Software performance optimization is a notoriously complex and manual task. Despite the growing use of Large Language Models (LLMs) for code refinement, we still lack benchmarks t…

查看原始来源

As large language model (LLM) agents are applied to longer tasks, they increasingly modify workspace state across multiple rounds of iteration. However, agents typically observe o…

查看原始来源

Large Language Models (LLMs) are frequently portrayed as general-purpose solvers capable of solving arbitrary tasks. We argue that this view overlooks a fundamental constraint: la…

Agent Runtime Security

GIF: Locally Sound Geometric Information Flow Control for LLMs

查看原始来源

Large language models increasingly mediate interactions between sensitive data, untrusted inputs, and privileged actions in agentic systems, creating security and privacy risks. T…

Enhancing the formal math reasoning capabilities of Large Language Models (LLMs) has become a key focus in both mathematical and computer science communities in recent years. Whil…

Agent Runtime Security

CodeSentinel: A Three-Layer Defense Against Indirect Prompt Injection in Code Contexts

查看原始来源

Code large language models increasingly retrieve external code context from repositories, documentation, issue threads, and coding-agent environments, creating an indirect prompt-…

查看原始来源

On-policy self-distillation (OPSD) has proven effective for post-training large language models (LLMs), yet its application to diffusion LLMs (dLLMs) remains unexplored. Existing…

Agent Runtime Security

A Red-Team Study of Anthropic Fable 5 & Opus 4.8 Models

查看原始来源

We evaluate the adversarial robustness of two frontier large language models (LLMs) developed by Anthropic, Fable 5 and Opus 4.8, against four families of automated jailbreak atta…

查看原始来源

Large Language Models (LLMs) have significantly advanced the automation of software engineering tasks. One prominent example is code generation, where an LLM produces code in a sp…

查看原始来源

Large language model (LLM) agents increasingly act on a user's behalf -- reading personal files, calling tools, transacting with external services -- possibly leaking personally i…

查看原始来源

Warning: This paper contains several toxic and offensive statements. Modern large language models (LLMs) are typically aligned through large-scale post-training to ensure fair and…

查看原始来源

Learning everyday skills, like cooking a dish, relies increasingly on instructional media such as online videos. This opens the door to the use of video (and multimodal) large lan…

Agent Runtime Security

What the Eyes See, the LLMs Miss: Exploiting Human Perception for Adversarial Text Attacks

查看原始来源

Large language model (LLM)-powered content moderation systems have become a critical defense against harmful online content. However, these systems primarily operate on tokenized…

查看原始来源

Automated unit test generation using large language models (LLMs) holds great promise but often struggles with generating tests that are both correct and maintainable in real-worl…

查看原始来源

Pre-deployment verification of enterprise artificial intelligence (AI) agents remains a critical gap between large language model (LLM) capability benchmarking and production depl…

查看原始来源

Automating C-to-Rust migration is critical for improving software security without sacrificing performance. Traditional rule-based methods struggle with diverse C idioms, often pr…

查看原始来源

As Large Language Model (LLM) agents increasingly leverage the Model Context Protocol (MCP) to operate in complex environments, the expansion of their action spaces offers agents…

查看原始来源

Maintaining the safety of large language models (LLMs) is crucial as they are increasingly deployed in real-world applications. Existing safety guardrails typically rely on single…

查看原始来源

Zeroth-order (ZO) fine-tuning is attractive for large language models because it replaces backpropagation with forward objective evaluations. Existing implementations nevertheless…

查看原始来源

Background: Large language models are typically evaluated as models, benchmarks, or short conversational episodes. Less is known about what happens when an agent is embedded persi…

查看原始来源

This paper applies machine learning to the difficult and important task of version control merging. (1) We constructed a dataset, Merge-Bench, of 7938 real-world merge conflict hu…

Agent Runtime Security

AgentHijack: Benchmarking Computer Use Agent Robustness to Common Environment Corruptions

查看原始来源

Autonomous computer use agents that powered by multimodal large language models (MLLMs) are emerging as capable assistants for completing complex digital workflows. However, real-…

查看原始来源

While current software agents powered by large language models (LLMs) and agentic reinforcement learning (RL) can boost programmer productivity, their training data (e.g., GitHub…

查看原始来源

Large language model (LLM) agents often rely on long sequences of low-level textual actions, resulting in large effective decision horizons and high inference cost. While prior wo…

AMR-SD: Asymmetric Meta-Reflective Self-Distillation for Token-Level Credit Assignment

查看原始来源

The alignment of Large Language Models (LLMs) for complex reasoning heavily relies on Reinforcement Learning with Verifiable Rewards (RLVR). However, standard algorithms like GRPO…

Agent Runtime Security

Multilingual jailbreaking of LLMs using low-resource languages

查看原始来源

Large Language Models (LLMs) remain vulnerable to jailbreak attempts that circumvent safety guardrails. We investigate whether multi-turn conversations using low-resource African…

查看原始来源

Large language model (LLM) agents require long-term memory to leverage information from past interactions. However, existing memory systems often face a fidelity--efficiency trade…

查看原始来源

Large Language Models (LLMs) have demonstrated remarkable abilities in reasoning. However, maximizing their potential through inference-time scaling faces challenges in trade-off…

Terminal and SWE Agents

SWE-Chain: Benchmarking Coding Agents on Chained Release-Level Package Upgrades

查看原始来源

Coding agents powered by large language models are increasingly expected to perform realistic software maintenance tasks beyond isolated issue resolution. Existing benchmarks have…

Prompt specifications for multi-agent large language model (LLM) systems carry data contracts and integration logic across many interdependent files but are rarely subjected to st…

Instruction Lens Score: Your Instruction Contributes a Powerful Object Hallucination Detector for Multimodal Large Language Models

查看原始来源

Multimodal large language models (MLLMs) have achieved remarkable progress, yet the object hallucination remains a critical challenge for reliable deployment. In this paper, we pr…

Agent Runtime Security

Metaphor Is Not All Attention Needs

查看原始来源

Large language models are increasingly deployed in safety-critical applications, where their ability to resist harmful instructions is essential. Although post-training aims to ma…

查看原始来源

This paper demonstrates RUBEN, an interactive tool for discovering minimal rules to explain the outputs of retrieval-augmented large language models (LLMs) in data-driven applicat…

查看原始来源

Retrieval-Augmented Generation (RAG) enables large language models (LLMs) to leverage external knowledge, but also exposes valuable RAG databases to leakage attacks. As RAG system…

Automated Machine Learning (AutoML) frameworks increasingly leverage Large Language Models (LLMs) for tasks such as hyperparameter optimization and neural architecture code genera…

查看原始来源

With the rapid development of generative artificial intelligence, large language models (LLMs) have gradually integrated into various fields, demonstrating significant potential,…

查看原始来源

Electrocardiography (ECG) is a fundamental tool for diagnosing cardiovascular diseases, yet the scarcity of large-scale annotated data limits the applicability of supervised learn…

PubMed AI

Learning from Prototypes: Contrastive Learning with Prior-Aware Multi-Label Chest X-ray Classification.

查看原始来源

Multi-label Chest X-ray (CXR) classification faces significant challenges from the inherently imperfect nature of clinical data, particularly the complex interplay of co-occurring…

查看原始来源

OBJECTIVES: Coronary computed tomography angiography (CCTA) has become a cornerstone in non-invasive CAD diagnosis and risk stratification. To standardize reporting and improve cl…

查看原始来源

The rapid integration of large language models into electronic medical record systems introduces a critical theoretical vulnerability. Drawing on foundational computer science pro…

PubMed AI

APSevLM: Acute Pancreatitis Severity Language Model.

查看原始来源

Approximately one-fifth of patients with acute pancreatitis (AP) develop severe forms, which are associated with high mortality rates, making early prediction of severity crucial…

查看原始来源

BACKGROUND: Large language models (LLMs) are increasingly used to obtain health information, including guidance on child and adolescent mental health. In anorexia nervosa (AN), wh…

2026-04-20

2026-04-20 11:48:52 (Asia/Shanghai)

OpenAlex AI

Artificial Intelligence And The Transformation of Labor Markets

查看原始来源

The rapid advancement of artificial intelligence (AI) technologies, particularly generative AI and large language models, has reignited debates about the future of work and the po…

OpenAlex AI

Artificial Intelligence And The Transformation of Labor Markets

查看原始来源

The rapid advancement of artificial intelligence (AI) technologies, particularly generative AI and large language models, has reignited debates about the future of work and the po…

Dual perspectives on large language models in rheumatology: physician-rated quality and patient-centered usability of GPT-4o versus DeepSeek-V3.

查看原始来源

OBJECTIVES: This study conducted an informatics system evaluation of two LLMs (GPT-4o and DeepSeek-V3) for patient education, combining clinician-rated quality with patient-percei…

查看原始来源

BACKGROUND AND OBJECTIVES: Traditional medical board examinations present clinical information in static vignettes with multiple-choices (MC), fundamentally different from how phy…

2026-04-15

2026-04-15 11:35:50 (Asia/Shanghai)

LLM

AISafetyBenchExplorer: A Metric-Aware Catalogue of AI Safety Benchmarks Reveals Fragmented Measurement and Weak Benchmark Governance

查看原始来源

The rapid expansion of large language model (LLM) safety evaluation has produced a substantial benchmark ecosystem, but not a correspondingly coherent measurement ecosystem. We pr…

Vision

All in One: A Unified Synthetic Data Pipeline for Multimodal Video Understanding

查看原始来源

Training multimodal large language models (MLLMs) for video understanding requires large-scale annotated data spanning diverse tasks such as object counting, question answering, a…

PubMed AI

Multimodal large language models in brain tumor imaging: clinical applications and future perspectives.

查看原始来源

The use of multimodal data is essential for the precise diagnosis and treatment of brain tumors. In this context, multimodal data encompass multisequence magnetic resonance imagin…

PubMed AI

User Experience and Early Clinical Outcomes of a Mental Wellness Chatbot for Depression and Anxiety: Pilot Evaluation Mixed Methods Study.

查看原始来源

BACKGROUND: Artificial intelligence-powered conversational agents (ie, chatbots) are increasingly popular outlets for users seeking psychological support, yet little is known abou…

PubMed AI

Comparison of AI-based Chatbot Performance in Analyzing Clinical Scenarios versus Medical Residents: A Novel Approach in Chest Diseases Education.

查看原始来源

OBJECTIVE: Rapid advancements in artificial intelligence (AI) technologies offer new opportunities in medical education. The aim of this study is to compare the performance of lar…

查看原始来源

Researchers and clinicians are increasingly looking to leverage artificial intelligence (AI) and digital tools to improve psychiatric care. Of particular promise is addressing the…

2026-04-08

2026-04-08 17:10:24 (Asia/Shanghai)

LLM

Paper Circle: An Open-source Multi-agent Research Discovery and Analysis Framework

查看原始来源

The rapid growth of scientific literature has made it increasingly difficult for researchers to efficiently discover, evaluate, and synthesize relevant work. Recent advances in mu…

LLM

Toward Consistent World Models with Multi-Token Prediction and Latent Semantic Enhancement

查看原始来源

Whether Large Language Models (LLMs) develop coherent internal world models remains a core debate. While conventional Next-Token Prediction (NTP) focuses on one-step-ahead supervi…

LLM

Claw-Eval: Toward Trustworthy Evaluation of Autonomous Agents

查看原始来源

Large language models are increasingly deployed as autonomous agents executing multi-step workflows in real-world software environments. However, existing agent benchmarks suffer…

Vision

Scientific Graphics Program Synthesis via Dual Self-Consistency Reinforcement Learning

查看原始来源

Graphics Program Synthesis is pivotal for interpreting and editing visual data, effectively facilitating the reverse-engineering of static visuals into editable TikZ code. While T…