in-context learning Topic Archive

in-context learning Topic Archive in-context-learning.html 关键词 in-context learning 的长期追踪 RSS，汇总历史命中文献。 zh-CN Sun, 28 Jun 2026 05:24:06 +0000 MedGuards: Multi-Agent System for Reliable Medical Error Detection and Correction ../papers/arxiv-924e9f45b440.html https://arxiv.org/abs/2606.25651v1#2026-06-25#in-context-learning Thu, 25 Jun 2026 13:11:21 +0800 As Large Language Models (LLMs) are increasingly deployed in healthcare settings, accurate error detection and correction in generated or existing text becomes critical, as even minor mistakes can pose risks to patient safety. Existing methods for error detection and correction, including automated checks and heuristic-based approaches, do not generalize well across unseen datasets. In this paper, we propose MedGuards as a medical safety guardrail, which is a new framework that treats medical e… Pigeonholing: Bad prompts hurt models to collapse and make mistakes ../papers/arxiv-112c872ebf06.html https://arxiv.org/abs/2606.24267v1#2026-06-24#in-context-learning Wed, 24 Jun 2026 13:06:49 +0800 While in-context learning is generally shown to be effective in Large Language Models (LLMs), bad contexts can cause performance degradation and mode collapse, a phenomenon we call "pigeonholing." **Unintentionally bad** contexts can happen without malicious jailbreaking intents: For example, a user asks the model to justify an incorrect math theorem or fails to correct the model's buggy code. Specifically, we investigate ``pigeonholing" in two scenarios: (1) when the user suggests a solution,… Navigating Unreliable Parametric and Contextual Knowledge: Explicit Knowledge Conflict Resolution for LLM Inference ../papers/arxiv-1b4902e41aec.html https://arxiv.org/abs/2606.20245v1#2026-06-19#in-context-learning Fri, 19 Jun 2026 14:26:15 +0800 Large language models (LLMs) have achieved strong performance across a wide range of language-based tasks by leveraging both extensive parametric knowledge and in-context learning ability, enabling them to incorporate external information provided in the input prompt. However, the integration of external knowledge can introduce conflicts, not only between the model's internal parametric knowledge and the external information, but also among multiple pieces of external contexts. Existing approac… What Do Safety-Aligned LLMs Learn From Mixed Compliance Demonstrations? ../papers/arxiv-d2c45c3b54a7.html https://arxiv.org/abs/2606.20508v1#2026-06-19#in-context-learning Fri, 19 Jun 2026 14:26:15 +0800 Prior work has shown that in-context demonstrations can jailbreak language models, but it remains unclear how models interpret different types of compliance demonstrations. We study this by mixing benign compliance demonstrations (non-harmful request, helpful response) with harmful compliance demonstrations (harmful request, helpful response) and testing three hypotheses about how demonstration composition drives harmful compliance. Across four models, we find that benign and harmful demonstrat… Querying an astronomical database using large language models: the ALeRCE text-to-SQL system ../papers/arxiv-8a813f327a5a.html https://arxiv.org/abs/2606.18108v1#2026-06-17#in-context-learning Wed, 17 Jun 2026 14:22:19 +0800 We develop a text-to-SQL (structured query language) system based on large language models (LLMs) using in-context learning and apply it to the Automatic Learning for the Rapid Classification of Events (ALeRCE) astronomical database. ALeRCE is a community broker for the Zwicky Transient Facility and the Vera C. Rubin Observatory. The system enables users to query the database in natural language (NL) and generates executable SQL queries. To develop and evaluate the system, we constructed a data… Caliper: Probing Lexical Anchors versus Causal Structure in LLMs ../papers/arxiv-ccfc01d31332.html https://arxiv.org/abs/2606.04915#2026-06-04#in-context-learning Thu, 04 Jun 2026 14:02:06 +0800 Large language models reach 50 to 70% accuracy on causal reasoning benchmarks such as CLadder, but it is unclear whether this reflects structural reasoning or lexical pattern matching. We introduce Caliper, a controlled perturbation that replaces semantic variable names with placeholder tokens while preserving the causal graph and probabilistic specification of each question. Across nine instruction-tuned LLMs from 3.8B to 671B and three causal reasoning benchmarks, lexical anonymization yields… Reasoning over Grammar: Can Synthetic Linguistic Reasoning Traces Enhance Low-Resource Machine Translation? ../papers/arxiv-98760774739b.html https://arxiv.org/abs/2606.03782#2026-06-03#in-context-learning Wed, 03 Jun 2026 14:09:56 +0800 Large language models (LLMs) offer a promising approach to machine translation (MT) for extremely low-resource languages by incorporating linguistic resources through in-context learning. However, LLMs often struggle to apply grammatical information effectively during translation. Inspired by recent progress in chain-of-thought reasoning, we investigate whether low-resource MT can benefit from structured intermediate steps of linguistic analysis and grammatical reasoning. We propose a pipeline… BioTool: A Comprehensive Tool-Calling Dataset for Enhancing Biomedical Capabilities of Large Language Models ../papers/arxiv-33f9027d56b4.html https://arxiv.org/abs/2605.05758#2026-05-08#in-context-learning Fri, 08 May 2026 14:15:32 +0800 Despite the success of large language models (LLMs) on general-purpose tasks, their performance in highly specialized domains such as biomedicine remains unsatisfactory. A key limitation is the inability of LLMs to effectively leverage biomedical tools, which clinical experts and biomedical researchers rely on extensively in daily workflows. While recent general-domain tool-calling datasets have substantially improved the capabilities of LLM agents, existing efforts in the biomedical domain lar… From Image to Pixels: towards Fine-Grained Medical Vision-Language Models. ../papers/doi-71303bb82f13.html https://pubmed.ncbi.nlm.nih.gov/41989909/#2026-04-17#in-context-learning Fri, 17 Apr 2026 11:39:21 +0800 Multimodal large language models (MLLMs) offer immense potential for biomedical AI, yet current applications remain limited to coarse-grained image understanding and basic textual queries-falling short of the fine-grained reasoning required in clinical contexts. In this work, we present a comprehensive solution spanning data, model, and training innovations to advance pixel-level multimodal intelligence in biomedicine. First, we construct MeCoVQA, a new visual-language benchmark that spans eigh…