clinical Topic Archive

clinical Topic Archive clinical.html 关键词 clinical 的长期追踪 RSS，汇总历史命中文献。 zh-CN Wed, 22 Apr 2026 03:37:20 +0000 Four-Axis Decision Alignment for Long-Horizon Enterprise AI Agents ../papers/arxiv-d363006cb185.html https://arxiv.org/abs/2604.19457v1#2026-04-22#clinical Wed, 22 Apr 2026 11:37:03 +0800 Long-horizon enterprise agents make high-stakes decisions (loan underwriting, claims adjudication, clinical review, prior authorization) under lossy memory, multi-step reasoning, and binding regulatory constraints. Current evaluation reports a single task-success scalar that conflates distinct failure modes and hides whether an agent is aligned with the standards its deployment environment requires. We propose that long-horizon decision behavior decomposes into four orthogonal alignment axes, e… RF-HiT: Rectified Flow Hierarchical Transformer for General Medical Image Segmentation ../papers/arxiv-ab2855fef1f1.html https://arxiv.org/abs/2604.19570v1#2026-04-22#clinical Wed, 22 Apr 2026 11:37:03 +0800 Accurate medical image segmentation requires both long-range contextual reasoning and precise boundary delineation, a task where existing transformer- and diffusion-based paradigms are frequently bottlenecked by quadratic computational complexity and prohibitive inference latency. We propose RF-HiT, a Rectified Flow Hierarchical Transformer that integrates an hourglass transformer backbone with a multi-scale hierarchical encoder for anatomically guided feature conditioning. Unlike prior diffusi… Classifying American Society of Anesthesiologists Physical Status With a Low-Rank-Adapted Large Language Model: Development and Validation Study. ../papers/doi-8b199115e87e.html https://pubmed.ncbi.nlm.nih.gov/42013456/#2026-04-22#clinical Wed, 22 Apr 2026 11:37:03 +0800 BACKGROUND: The American Society of Anesthesiologists Physical Status (ASA-PS) classification is integral to preoperative risk assessment; yet, assignment remains subjective and labor-intensive. Recent large language models (LLMs) process free-text electronic health records (EHRs), but few studies have evaluated parameter-efficient adaptations that both predict ASA-PS and provide clinician-readable rationales. Low-rank adaptation (LoRA) is a parameter-efficient technique that updates only a sma… Enhancing large language model clinical support information with machine learning risk and explainability: a feasibility study. ../papers/doi-eefd4e77621d.html https://pubmed.ncbi.nlm.nih.gov/42012584/#2026-04-22#clinical Wed, 22 Apr 2026 11:37:03 +0800 BACKGROUND: Current machine learning (ML) prediction models offer limited guidance for individualized actionable management. Large language models (LLMs) can transform ML model-predicted risk estimates with Shapley Additive Explanations (SHAP) into clinically meaningful support information, yet the added value of incorporating ML-derived data and the relative performance of different LLMs remain uncertain. To address these gaps, we used our previously developed IMPACT framework to evaluate the… Clinical Model Autophagy: The Risk of Interpretative Drift in Recursive Medical AI. ../papers/doi-637d5e47b283.html https://pubmed.ncbi.nlm.nih.gov/42013455/#2026-04-22#clinical Wed, 22 Apr 2026 11:37:03 +0800 The rapid integration of large language models into electronic medical record systems introduces a critical theoretical vulnerability. Drawing on foundational computer science proofs of "model collapse," this viewpoint introduces the concept of "Clinical Model Autophagy"-a systemic degradation of diagnostic integrity that occurs when clinical artificial intelligence (AI) models are recursively trained on unverified, AI-generated synthetic data. As these recursive models may progressively regres… APSevLM: Acute Pancreatitis Severity Language Model. ../papers/doi-e00fc28ccec0.html https://pubmed.ncbi.nlm.nih.gov/42013267/#2026-04-22#clinical Wed, 22 Apr 2026 11:37:03 +0800 Approximately one-fifth of patients with acute pancreatitis (AP) develop severe forms, which are associated with high mortality rates, making early prediction of severity crucial for effective patient management. In this study, we present APSevLM (Acute Pancreatitis Severity Language Model), a large language model (LLM)-based approach that integrates admission-time clinical data, imaging reports, and expert knowledge to predict AP severity at an early stage. Through a comprehensive evaluation u… Comparing Clinical Outcomes in Cardiac Surgical Patients Who Receive Sugammadex Versus Placebo: A Prospective Randomized Blinded Controlled Trial. ../papers/doi-ec10f242cbed.html https://pubmed.ncbi.nlm.nih.gov/42012852/#2026-04-22#clinical Wed, 22 Apr 2026 11:37:03 +0800 OBJECTIVES: To compare the difference in the number of cardiopulmonary bypass surgical patients who receive sugammadex vs. placebo and who meet the Society of Thoracic Surgery early extubation quality benchmark. DESIGN: Single-center, randomized, double-blind, placebo-controlled trial. SETTING: Participants were enrolled at a single U.S. hospital between August 2023 and July 2025. PATIENTS: Seventy-four eligible cardiac surgery patients undergoing cardiopulmonary bypass with anticipated institu… Transforming oncology clinical trial matching through neuro-symbolic, multi-agent AI and an oncology-specific knowledge graph: a prospective evaluation in 3804 patients. ../papers/doi-a39ecce65f3a.html https://pubmed.ncbi.nlm.nih.gov/42004487/#2026-04-21#clinical Tue, 21 Apr 2026 11:40:46 +0800 BACKGROUND: Clinical trial enrollment in oncology remains critically low, with fewer than 5% of eligible adults participating, in large part due to the complexity and labor intensity of eligibility screening. We prospectively evaluated a neuro-symbolic, multi-agent artificial intelligence (AI) platform integrating domain-specific large language model (LLM) agents, an oncology-specific knowledge graph, a real-time recommendation engine, and human-in-the-loop review to determine whether automated… Developing and evaluating definitions of real-world clinical endpoints for patients with early-stage triple-negative breast cancer using a United States of America secondary database. ../papers/doi-481edb543c43.html https://pubmed.ncbi.nlm.nih.gov/42004488/#2026-04-21#clinical Tue, 21 Apr 2026 11:40:46 +0800 BACKGROUND: The KEYNOTE-522 trial showed that neoadjuvant chemotherapy (NAC) plus adjuvant pembrolizumab improved overall survival, event-free survival (EFS), and pathological complete response (pCR) in high-risk early-stage triple-negative breast cancer. As treatments evolve, evaluating real-world (RW) effectiveness is key to understanding trial generalizability. This study benchmarked RW efficacy endpoints in early-stage triple-negative breast cancer patients treated with NAC. MATERIALS AND M… Investigating fine-tuning versus zero-shot learning for general large language models when predicting cancer survival from initial oncology consultation documents. ../papers/doi-eebfc182eb48.html https://pubmed.ncbi.nlm.nih.gov/42004490/#2026-04-21#clinical Tue, 21 Apr 2026 11:40:46 +0800 BACKGROUND: Unstructured oncology consultation notes contain rich clinical information that may support survival prediction. Open-weight large language models (LLMs) can utilize these notes with zero-shot inference or fine-tuning, but their relative value for this setting remains unclear. The objective of this study is to evaluate open-weight LLMs for predicting 60-month survival from initial oncology consultation notes, comparing (i) zero-shot performance, (ii) performance after fine-tuning, a… A Comparative Evaluation of Three Large Language Models for Parent-Centered Questions About Anorexia Nervosa. ../papers/doi-db4a2a7daf35.html https://pubmed.ncbi.nlm.nih.gov/42003757/#2026-04-21#clinical Tue, 21 Apr 2026 11:40:46 +0800 BACKGROUND: Large language models (LLMs) are increasingly used to obtain health information, including guidance on child and adolescent mental health. In anorexia nervosa (AN), where early recognition and timely intervention are critical, the accuracy of AI-generated information available to parents may have important clinical implications. This study evaluated the performance of LLMs in responding to parent-oriented questions about AN. METHODS: A comparative model evaluation was conducted usin… Impacts of Multidisciplinary Lung Cancer Meeting Presentation in a Clinical Quality Registry. ../papers/doi-878d50e507af.html https://pubmed.ncbi.nlm.nih.gov/42006279/#2026-04-21#clinical Tue, 21 Apr 2026 11:40:46 +0800 BACKGROUND: Lung cancer is a heterogeneous and complex disease requiring multidisciplinary input for optimal management planning, with guidelines recommending that all patients be discussed in a multidisciplinary setting. Multidisciplinary meeting (MDM) discussion aims to enhance evidence-based management, improve treatment access, and optimize complex management plans. METHODS: We aimed to assess the extent and impacts of MDM discussion in patients with lung cancer described by the Victorian L… Medic Training at Military-Civilian Partnerships-A Narrative Review. ../papers/doi-00657ec6b105.html https://pubmed.ncbi.nlm.nih.gov/42001305/#2026-04-20#clinical Mon, 20 Apr 2026 11:48:52 +0800 INTRODUCTION: Military-Civilian Partnerships (MCP) were developed to mitigate degradation of combat medical readiness during peacetime. Although these programs have historically focused on sustaining surgical readiness and training military physicians, MCP increasingly augment training for Army Combat Medics, Navy Hospital Corpsmen, Air Force Aerospace Service Specialist, and other non-physician military medical personnel. The effectiveness, scalability, and alignment of MCP along with evolving… Pretraining effective T5 generative models for clinical and biomedical applications. ../papers/doi-d4977a45ef49.html https://pubmed.ncbi.nlm.nih.gov/41996418/#2026-04-18#clinical Sat, 18 Apr 2026 11:26:55 +0800 This paper presents a study of the impact of corpus selection and vocabulary design on the performance of T5-based language models in clinical and biomedical domains. We introduce five different T5-EHR models, each pretrained from scratch using different combinations of clinical and biomedical corpora alongside domain-specific vocabularies. We evaluated these models across a variety of clinical and biomedical tasks to quantify the impact of pretraining data and vocabulary tokenization choices o… Comparative performance of large language models and Drugs.com versus Lexicomp for antiseizure medication drug-drug interactions: A cross-sectional study with iterative prompting analysis. ../papers/doi-b257aeab2d15.html https://pubmed.ncbi.nlm.nih.gov/41994367/#2026-04-18#clinical Sat, 18 Apr 2026 11:26:55 +0800 BACKGROUND: Antiseizure medications (ASMs) are frequently co-prescribed and are associated with a high risk of clinically significant drug-drug interactions (DDIs). Large language models (LLMs) are increasingly used for clinical queries, yet their performance in detecting ASM-related DDIs compared with established drug interaction databases remains uncertain. METHODS: A cross-sectional comparative study evaluated 186 ASM-comedication pairs (126 classified as major/moderate by Lexicomp) using Ch… An explainable multi-head attention network for healthcare IoT threat detection based on the MedDefender-MHAN framework. ../papers/doi-ff821e86a727.html https://pubmed.ncbi.nlm.nih.gov/41996403/#2026-04-18#clinical Sat, 18 Apr 2026 11:26:55 +0800 The rapid proliferation of Internet of Medical Things (IoMT) devices in healthcare environments has created critical cybersecurity vulnerabilities that demand both accurate and interpretable intrusion detection solutions. Existing deep learning-based intrusion detection systems (IDS) achieve high detection accuracy but lack inherent explainability, limiting their clinical adoption under regulatory frameworks such as GDPR and FDA guidelines. This paper presents MedDefender-MHAN, an explainable m… RadAgent: A tool-using AI agent for stepwise interpretation of chest computed tomography ../papers/arxiv-d12df90e00da.html https://arxiv.org/abs/2604.15231v1#2026-04-17#clinical Fri, 17 Apr 2026 11:39:21 +0800 Vision-language models (VLM) have markedly advanced AI-driven interpretation and reporting of complex medical imaging, such as computed tomography (CT). Yet, existing methods largely relegate clinicians to passive observers of final outputs, offering no interpretable reasoning trace for them to inspect, validate, or refine. To address this, we introduce RadAgent, a tool-using AI agent that generates CT reports through a stepwise and interpretable process. Each resulting report is accompanied by… SegWithU: Uncertainty as Perturbation Energy for Single-Forward-Pass Risk-Aware Medical Image Segmentation ../papers/arxiv-23036fba0e62.html https://arxiv.org/abs/2604.15271v1#2026-04-17#clinical Fri, 17 Apr 2026 11:39:21 +0800 Reliable uncertainty estimation is critical for medical image segmentation, where automated contours feed downstream quantification and clinical decision support. Many strong uncertainty methods require repeated inference, while efficient single-forward-pass alternatives often provide weaker failure ranking or rely on restrictive feature-space assumptions. We present $\textbf{SegWithU}$, a post-hoc framework that augments a frozen pretrained segmentation backbone with a lightweight uncertainty… Applying natural language processing and large language models to clinical notes for phenotyping and diagnosing rare diseases: a systematic review. ../papers/doi-caeec9f876b5.html https://pubmed.ncbi.nlm.nih.gov/41990239/#2026-04-17#clinical Fri, 17 Apr 2026 11:39:21 +0800 OBJECTIVES: Patients with rare diseases often face long delays before receiving a diagnosis. Using electronic health records for automated phenotyping and diagnosis of rare diseases is a promising approach but can be challenging because critical information is often recorded in unstructured notes rather than structured fields. This systematic review synthesizes the current literature applying natural language processing (NLP) and large language models (LLMs) for rare disease phenotyping and dia… Evaluation of large language models with clinical guidance for vetting outpatient magnetic resonance imaging lumbar spine referrals. ../papers/doi-2fe134b4d7bc.html https://pubmed.ncbi.nlm.nih.gov/41989203/#2026-04-17#clinical Fri, 17 Apr 2026 11:39:21 +0800 ObjectivesAccurate triage of lumbar spine magnetic resonance imaging (MRI) referrals for sciatica is important for patient assessment, diagnosis and surgical planning. This study evaluates the accuracy and speed of large language models (LLMs) in automatically vetting lumbar spine MRI referrals from general practice.MethodsThree LLMs (GPT-4, Claude Opus, Gemini) were tasked with assigning an outcome (Accept - Routine, Accept - Urgent, Reject) and flagging MRI contraindications for lumbar spine… From Image to Pixels: towards Fine-Grained Medical Vision-Language Models. ../papers/doi-71303bb82f13.html https://pubmed.ncbi.nlm.nih.gov/41989909/#2026-04-17#clinical Fri, 17 Apr 2026 11:39:21 +0800 Multimodal large language models (MLLMs) offer immense potential for biomedical AI, yet current applications remain limited to coarse-grained image understanding and basic textual queries-falling short of the fine-grained reasoning required in clinical contexts. In this work, we present a comprehensive solution spanning data, model, and training innovations to advance pixel-level multimodal intelligence in biomedicine. First, we construct MeCoVQA, a new visual-language benchmark that spans eigh… Targeted use of large language models for EHR-based computable phenotyping. ../papers/doi-d44eb8c5ebfc.html https://pubmed.ncbi.nlm.nih.gov/41990328/#2026-04-17#clinical Fri, 17 Apr 2026 11:39:21 +0800 OBJECTIVE: Computable phenotypes derived from electronic health records (EHRs) are central to clinical research and quality reporting. Although large language models (LLMs) can extract clinically rich information from unstructured notes, routine application to all patients is computationally expensive. We evaluated whether uncertainty-guided selective use of LLMs can improve phenotyping accuracy while preserving scalability. MATERIALS AND METHODS: We developed a selective augmentation framework… Dual perspectives on large language models in rheumatology: physician-rated quality and patient-centered usability of GPT-4o versus DeepSeek-V3. ../papers/doi-fa629176d611.html https://pubmed.ncbi.nlm.nih.gov/41989204/#2026-04-17#clinical Fri, 17 Apr 2026 11:39:21 +0800 OBJECTIVES: This study conducted an informatics system evaluation of two LLMs (GPT-4o and DeepSeek-V3) for patient education, combining clinician-rated quality with patient-perceived usability across thematically stratified queries. MATERIALS AND METHODS: In a blinded, within-subject design, 16 frequently asked questions about biologic therapies were categorized into three domains: treatment/drug selection, safety/adverse effects, and special conditions/daily life. Responses were standardized,… MedRCube: A Multidimensional Framework for Fine-Grained and In-Depth Evaluation of MLLMs in Medical Imaging ../papers/arxiv-309351a1c9e5.html https://arxiv.org/abs/2604.13756v1#2026-04-16#clinical Thu, 16 Apr 2026 11:43:00 +0800 The potential of Multimodal Large Language Models (MLLMs) in domain of medical imaging raise the demands of systematic and rigorous evaluation frameworks that are aligned with the real-world medical imaging practice. Existing practices that report single or coarse-grained metrics are lack the granularity required for specialized clinical support and fail to assess the reliability of reasoning mechanisms. To address this, we propose a paradigm shift toward multidimensional, fine-grained and in-d… PBE-UNet: A light weight Progressive Boundary-Enhanced U-Net with Scale-Aware Aggregation for Ultrasound Image Segmentation ../papers/arxiv-f0b69cb6a500.html https://arxiv.org/abs/2604.13791v1#2026-04-16#clinical Thu, 16 Apr 2026 11:43:00 +0800 Accurate lesion segmentation in ultrasound images is essential for preventive screening and clinical diagnosis, yet remains challenging due to low contrast, blurry boundaries, and significant scale variations. Although existing deep learning-based methods have achieved remarkable performance, these methods still struggle with scale variations and indistinct tumor boundaries. To address these challenges, we propose a progressive boundary enhanced U-Net (PBE-UNet). Specially, we first introduce a… Augmenting Large Language Model With Prompt Engineering and Supervised Fine-Tuning in Non-Small Cell Lung Cancer Tumor-Node-Metastasis Staging: Framework Development and Validation. ../papers/doi-39281e964532.html https://pubmed.ncbi.nlm.nih.gov/41984624/#2026-04-16#clinical Thu, 16 Apr 2026 11:43:00 +0800 BACKGROUND: Accurate tumor node metastasis (TNM) staging is fundamental for treatment planning and prognosis in non-small cell lung cancer (NSCLC). However, its complexity poses significant challenges. Traditional rule-based natural language processing methods are constrained by their reliance on manually crafted rules and are susceptible to inconsistencies in clinical reporting. OBJECTIVE: This study aimed to develop and validate a robust, accurate, and operationally efficient artificial intel… PKFAR: psychiatry knowledge-fused augmented reasoning with large language models. ../papers/doi-5a4aadf4d2b0.html https://pubmed.ncbi.nlm.nih.gov/41982804/#2026-04-16#clinical Thu, 16 Apr 2026 11:43:00 +0800 PURPOSE: Psychiatric diagnosis faces significant challenges due to subjective symptom reporting and complex diagnostic criteria. While Large Language Models (LLMs) offer potential clinical decision support, their implementation is hindered by privacy constraints on commercial models (e.g., GPT-o3, Gemini-2.5) and computational demands of massive-scale open-source alternatives (e.g., DeepSeek-R1). These constraints necessitate knowledge-enhanced approaches with smaller-scale LLMs as the primary… Fact-Checking Large Language Model Responses to a Health Care Prompt: Comparative Study. ../papers/doi-442942d6cd6f.html https://pubmed.ncbi.nlm.nih.gov/41985066/#2026-04-16#clinical Thu, 16 Apr 2026 11:43:00 +0800 BACKGROUND: Large language models use machine learning to produce natural language. These models have a range of potential applications in health care, such as patient education and diagnosis. However, evaluations of large language models in health care are still scarce. OBJECTIVE: This study aimed to (1) evaluate the accuracy and efficiency of automated fact-checking by 2 large language models and (2) illustrate a process through which a large language model might support a patient in redrafti… Fine-Tuned Large Language Models for Automated Radiology Impression Generation: A Multicenter Evaluation. ../papers/doi-80e22cb1c8f2.html https://pubmed.ncbi.nlm.nih.gov/41983921/#2026-04-16#clinical Thu, 16 Apr 2026 11:43:00 +0800 Purpose To develop a fine-tuned large language model (Medical Imaging Report Assistant, MIRA) and evaluate its performance in generating radiology impressions from multicenter data with respect to accuracy, reporting efficiency, and clinical applicability. Materials and Methods A retrospective multicenter dataset comprising 1.87 million radiology reports (including CT, MRI, and digital radiography data) from 42 hospitals across 22 provinces in China (January 2019 to August 2024) was compiled. T… A Multi-AI Agent Framework for Interactive Neurosurgical Education and Evaluation: From Vignettes to Virtual Conversations. ../papers/doi-1c2530337309.html https://pubmed.ncbi.nlm.nih.gov/41982325/#2026-04-16#clinical Thu, 16 Apr 2026 11:43:00 +0800 BACKGROUND AND OBJECTIVES: Traditional medical board examinations present clinical information in static vignettes with multiple-choices (MC), fundamentally different from how physicians gather and integrate data in practice. Recent advances in large language models (LLMs) offer promising approaches to creating more realistic clinical interactive conversations. However, these approaches are limited in neurosurgery, where patient communication capacity varies significantly and diagnosis heavily… Probabilistic Feature Imputation and Uncertainty-Aware Multimodal Federated Aggregation ../papers/arxiv-b480ff0cabeb.html https://arxiv.org/abs/2604.12970v1#2026-04-15#clinical Wed, 15 Apr 2026 11:35:50 +0800 Multimodal federated learning enables privacy-preserving collaborative model training across healthcare institutions. However, a fundamental challenge arises from modality heterogeneity: many clinical sites possess only a subset of modalities due to resource constraints or workflow variations. Existing approaches address this through feature imputation networks that synthesize missing modality representations, yet these methods produce point estimates without reliability measures, forcing downs… AbdomenGen: Sequential Volume-Conditioned Diffusion Framework for Abdominal Anatomy Generation ../papers/arxiv-7f4fd8c173f5.html https://arxiv.org/abs/2604.12969v1#2026-04-15#clinical Wed, 15 Apr 2026 11:35:50 +0800 Computational phantoms are widely used in medical imaging research, yet current systems to generate controlled, clinically meaningful anatomical variations remain limited. We present AbdomenGen, a sequential volume-conditioned diffusion framework for controllable abdominal anatomy generation. We introduce the \textbf{Volume Control Scalar (VCS)}, a standardized residual that decouples organ size from body habitus, enabling interpretable volume modulation. Organ masks are synthesized sequentiall… Multimodal large language models in brain tumor imaging: clinical applications and future perspectives. ../papers/doi-fb5d26b2eb57.html https://pubmed.ncbi.nlm.nih.gov/41979660/#2026-04-15#clinical Wed, 15 Apr 2026 11:35:50 +0800 The use of multimodal data is essential for the precise diagnosis and treatment of brain tumors. In this context, multimodal data encompass multisequence magnetic resonance imaging, computed tomography, positron emission tomography, histopathological images, molecular and genomic profiles, structured clinical variables, and radiological reports. With the rapid advancement of artificial intelligence, integrating these heterogeneous data sources has become a central research direction for improvi… Bridging the Modality Gap in Medical Vision-Language Models: A Hybrid Contrastive-Optimal Transport Framework for Enhanced Cross-Modal Alignment. ../papers/doi-48f3f7f35ec5.html https://pubmed.ncbi.nlm.nih.gov/41979955/#2026-04-15#clinical Wed, 15 Apr 2026 11:35:50 +0800 Vision-language models in healthcare face a critical limitation, i.e., the modality gap, where image and text embeddings occupy distantly separated regions in shared representation space. This is reinforced by traditional contrastive learning objectives, and manifests itself through fundamental constraints in cross-modal understanding and downstream task performance. Existing approaches focus on addressing input-level requirements, however, the geometric constraints imposed by multimodal contra… User Experience and Early Clinical Outcomes of a Mental Wellness Chatbot for Depression and Anxiety: Pilot Evaluation Mixed Methods Study. ../papers/doi-d5f518895cd3.html https://pubmed.ncbi.nlm.nih.gov/41980262/#2026-04-15#clinical Wed, 15 Apr 2026 11:35:50 +0800 BACKGROUND: Artificial intelligence-powered conversational agents (ie, chatbots) are increasingly popular outlets for users seeking psychological support, yet little is known about how users experience early-stage prototypes or which therapeutic processes contribute to clinical improvement. A transparent evaluation of emerging chatbot prototypes is needed to clarify if, how, and why artificial intelligence companions work and to guide their continued development. OBJECTIVE: This mixed methods p… Comparison of AI-based Chatbot Performance in Analyzing Clinical Scenarios versus Medical Residents: A Novel Approach in Chest Diseases Education. ../papers/doi-e4a154d94827.html https://pubmed.ncbi.nlm.nih.gov/41979097/#2026-04-15#clinical Wed, 15 Apr 2026 11:35:50 +0800 OBJECTIVE: Rapid advancements in artificial intelligence (AI) technologies offer new opportunities in medical education. The aim of this study is to compare the performance of large language models, specifically ChatGPT-4 and Gemini, in analyzing clinical scenarios with that of chest diseases research assistants (residents), and to evaluate their potential roles in medical education. MATERIAL AND METHODS: This cross-sectional, comparative study included 28 resident physicians working in the dep… GazeVaLM: A Multi-Observer Eye-Tracking Benchmark for Evaluating Clinical Realism in AI-Generated X-Rays ../papers/doi-797bb9dad901.html https://arxiv.org/abs/2604.11653v1#2026-04-14#clinical Tue, 14 Apr 2026 11:37:06 +0800 We introduce GazeVaLM, a public eye-tracking dataset for studying clinical perception during chest radiograph authenticity assessment. The dataset comprises 960 gaze recordings from 16 expert radiologists interpreting 30 real and 30 synthetic chest X-rays (generated by diffusion based generative AI) under two conditions: diagnostic assessment and real-fake classification (Visual Turing test). For each image-observer pair, we provide raw gaze samples, fixation maps, scanpaths, saliency density m… Budget-Aware Uncertainty for Radiotherapy Segmentation QA Using nnU-Net ../papers/arxiv-dd7f6721d11f.html https://arxiv.org/abs/2604.11798v1#2026-04-14#clinical Tue, 14 Apr 2026 11:37:06 +0800 Accurate delineation of the Clinical Target Volume (CTV) is essential for radiotherapy planning, yet remains time-consuming and difficult to assess, especially for complex treatments such as Total Marrow and Lymph Node Irradiation (TMLI). While deep learning-based auto-segmentation can reduce workload, safe clinical deployment requires reliable cues indicating where models may be wrong. In this work, we propose a budget-aware uncertainty-driven quality assurance (QA) framework built on nnU-Net,… Efficient KernelSHAP Explanations for Patch-based 3D Medical Image Segmentation ../papers/arxiv-f426fe16d894.html https://arxiv.org/abs/2604.11775v1#2026-04-14#clinical Tue, 14 Apr 2026 11:37:06 +0800 Perturbation-based explainability methods such as KernelSHAP provide model-agnostic attributions but are typically impractical for patch-based 3D medical image segmentation due to the large number of coalition evaluations and the high cost of sliding-window inference. We present an efficient KernelSHAP framework for volumetric CT segmentation that restricts computation to a user-defined region of interest and its receptive-field support, and accelerates inference via patch logit caching, reusin… Seeing Through the Tool: A Controlled Benchmark for Occlusion Robustness in Foundation Segmentation Models ../papers/arxiv-850cced4e5aa.html https://arxiv.org/abs/2604.11711v1#2026-04-14#clinical Tue, 14 Apr 2026 11:37:06 +0800 Occlusion, where target structures are partially hidden by surgical instruments or overlapping tissues, remains a critical yet underexplored challenge for foundation segmentation models in clinical endoscopy. We introduce OccSAM-Bench, a benchmark designed to systematically evaluate SAM-family models under controlled, synthesized surgical occlusion. Our framework simulates two occlusion types (i.e., surgical tool overlay and cutout) across three calibrated severity levels on three public polyp… Comparing Large Language Models and Traditional Machine Translation Tools for Translating Medical Consultation Summaries: Quantitative Pilot Feasibility Study. ../papers/doi-d201429cca0c.html https://pubmed.ncbi.nlm.nih.gov/41973653/#2026-04-14#clinical Tue, 14 Apr 2026 11:37:06 +0800 BACKGROUND: Translation of medical consultation summaries is essential for equitable health care communication in culturally and linguistically diverse populations. While machine translation (MT) tools and large language models (LLMs) are widely accessible, their feasibility and safety for health care contexts remain underexplored. OBJECTIVE: This pilot study investigates the feasibility and limitations of using LLMs and traditional MT tools to translate medical consultation summaries from Engl… Toward Sustainable Clinical Analysis: Benchmarking Plastic Use in LC-MS Sample Preparation - Exemplified by Ketamine Analogues in Whole Blood. ../papers/doi-a4e602e64d3b.html https://pubmed.ncbi.nlm.nih.gov/41972595/#2026-04-14#clinical Tue, 14 Apr 2026 11:37:06 +0800 The aim of this study was to assess and benchmark plastic consumption in sample preparation for forensic analysis, alongside the development of an LC-MS method for ketamine analogues in whole blood, with various sustainability-related scores and parameters examined throughout. Ketamine analogues are emerging psychoactive substances associated with intoxication and fatalities globally. An analytical method was developed for determining ketamine and eight of its analogues in human whole blood. Fo… Diversity in clinical Trials: The example of systemic lupus erythematosus. ../papers/doi-ce81229c31f1.html https://pubmed.ncbi.nlm.nih.gov/41969623/#2026-04-14#clinical Tue, 14 Apr 2026 11:37:06 +0800 OBJECTIVE: The FDA requires clinical trials to reflect real-world diversity. Systemic lupus erythematosus (SLE) is a disease that disproportionately affects individuals of Black African descent that has not been assessed for diversity in clinical trials to date. This study compared demographics from two real-world data (RWD) sources and proposes parameters for representative trial populations. METHODS: Demographics of United States (US) SLE patients were extracted from electronic health records… Comparative Performance of Gemini 3 Pro and GPT-5 Family Models on Ophthalmology Board-Style Questions. ../papers/doi-a326948aeb7e.html https://pubmed.ncbi.nlm.nih.gov/41970036/#2026-04-14#clinical Tue, 14 Apr 2026 11:37:06 +0800 OBJECTIVE: To compare the performance of state-of-the-art Gemini and GPT models on ophthalmology board-style questions and examine variation by subspecialty, cognitive complexity, and question type. DESIGN: A cross-sectional evaluation of 12 distinct large language model (LLM) configurations using a standardized ophthalmology question set. SUBJECTS: Five hundred multiple-choice questions (250 from the American Academy of Ophthalmology's Basic and Clinical Science Course [BCSC]; 250 StatPearls).… Evaluating the clinical decision-making performance of large language models in clinically oriented thoracic anatomy scenarios: a comparative evaluation study. ../papers/doi-4f9dec389ad2.html https://pubmed.ncbi.nlm.nih.gov/41963950/#2026-04-11#clinical Sat, 11 Apr 2026 23:09:08 +0800 来源分组：PubMed AI。归档日期：2026-04-11。 Exploratory study of large language models in surgical decision-making for lumbar disc herniation: a multicenter analysis based on multisource clinical information. ../papers/doi-a877eb92b775.html https://pubmed.ncbi.nlm.nih.gov/41963879/#2026-04-11#clinical Sat, 11 Apr 2026 23:09:08 +0800 来源分组：PubMed AI。归档日期：2026-04-11。 A hybrid large language model framework for structured data entry from code-switched persian clinical speech. ../papers/doi-cecfcd7bdb9a.html https://pubmed.ncbi.nlm.nih.gov/41963402/#2026-04-11#clinical Sat, 11 Apr 2026 23:09:08 +0800 来源分组：PubMed AI。归档日期：2026-04-11。 Factors influencing large language model adoption among dental students: a cross-sectional study. ../papers/doi-d7ceffe25bc3.html https://pubmed.ncbi.nlm.nih.gov/41963506/#2026-04-11#clinical Sat, 11 Apr 2026 23:09:08 +0800 This research evaluates the factors influencing the behavioural intention (BI) to adopt large language models (LLMs) among dental students in education, clinical decision support (CDS), and research, using the original unified theory of acceptance and use of technology (UTAUT) model, representing the first application of this model in this specific context. LLM adoption among Saudi dental students is unstructured and unregulated, making empirical evidence on adoption factors an educational and… ClinicRealm: Re-evaluating large language models with conventional machine learning for non-generative clinical prediction tasks. ../papers/pubmed-0be57f95f498.html https://pubmed.ncbi.nlm.nih.gov/41951858/#2026-04-09#clinical Thu, 09 Apr 2026 14:51:56 +0800 Large Language Models (LLMs) are increasingly deployed in medicine. However, their utility for non-generative clinical prediction is under-evaluated, and they are often assumed to be inferior to specialized models, creating potential for misuse and misunderstanding. To address this, our ClinicRealm benchmark systematically evaluates 15 GPT-style LLMs, 5 BERT-style models, and 11 traditional methods on unstructured clinical notes and structured Electronic Health Records (EHR) across predictive p…