<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0">
<channel>
<title>instruction tuning Topic Archive</title>
<link>instruction-tuning.html</link>
<description>关键词 instruction tuning 的长期追踪 RSS，汇总历史命中文献。</description>
<language>zh-CN</language>
<lastBuildDate>Sun, 28 Jun 2026 05:24:06 +0000</lastBuildDate>
<item>
<title>SARA: Unlocking Multilingual Knowledge in Mixture-of-Experts via Semantically Anchored Routing Alignment</title>
<link>../papers/arxiv-26ee80b3fcf6.html</link>
<guid>https://arxiv.org/abs/2606.25821v1#2026-06-25#instruction-tuning</guid>
<pubDate>Thu, 25 Jun 2026 13:11:21 +0800</pubDate>
<description>Sparse Mixture-of-Experts (MoE) architectures have emerged as an increasingly influential paradigm as they offer a strategic balance between parameter scalability and computational efficiency. However, low-resource languages, which suffer from a scarcity of high-quality training data, often have their tokens routed to different experts than those predominantly activated by high-resource inputs, which limits cross-lingual expert sharing. This cross-lingual routing divergence consequently hinders…</description>
</item>
<item>
<title>Evaluation Awareness Is Not One Capability: Evidence from Open Language Models</title>
<link>../papers/arxiv-ee26aecb66ba.html</link>
<guid>https://arxiv.org/abs/2606.23583v1#2026-06-23#instruction-tuning</guid>
<pubDate>Tue, 23 Jun 2026 13:10:02 +0800</pubDate>
<description>Safety benchmarks assume that test-condition behavior predicts deployment behavior, an assumption that fails if models detect evaluation cues and adapt. This opens a gap between benchmark performance and deployment behavior: compliance measured under test conditions becomes an optimistic upper bound that overstates how safely a model behaves once the evaluation harness is removed. We characterize this evaluation awareness through eight experiments across 37 open-weight models and seven families…</description>
</item>
<item>
<title>Trade-offs in Medical LLM Adaptation: An Empirical Study in French QA</title>
<link>../papers/arxiv-f93c4a971aed.html</link>
<guid>https://arxiv.org/abs/2606.19266v1#2026-06-18#instruction-tuning</guid>
<pubDate>Thu, 18 Jun 2026 14:03:08 +0800</pubDate>
<description>The development of large language models (LLMs) has led to an increased focus on their adaptation to specialized domains and languages, yet the effectiveness of domain adaptation strategies remains unclear. We present a study of medical domain adaptation using French medical question-answering (QA) as a case study. We compare continual pretraining (CPT), supervised fine-tuning (SFT), and their combination across three model families, multiple sizes, and three initialization types, explicitly di…</description>
</item>
<item>
<title>LoopCoder-v2: Only Loop Once for Efficient Test-Time Computation Scaling</title>
<link>../papers/arxiv-1ffe5e95cd6c.html</link>
<guid>https://arxiv.org/abs/2606.18023v1#2026-06-17#instruction-tuning</guid>
<pubDate>Wed, 17 Jun 2026 14:22:19 +0800</pubDate>
<description>Looped Transformers scale latent computation by repeatedly applying shared blocks, but sequential looping increases latency and KV-cache memory with the loop count. Parallel loop Transformers (PLT) alleviate this cost through cross-loop position offsets (CLP) and shared-KV gated sliding-window attention, making loop count a practical design choice. We therefore study PLT loop-count selection through a gain--cost view: an extra loop may refine representations, but CLP also introduces a positiona…</description>
</item>
<item>
<title>VoidPadding: Let [VOID] Handle Padding in Masked Diffusion Language Models so that [EOS] Can Focus on Semantic Termination</title>
<link>../papers/arxiv-026cd79abeae.html</link>
<guid>https://arxiv.org/abs/2606.17999v1#2026-06-17#instruction-tuning</guid>
<pubDate>Wed, 17 Jun 2026 14:22:19 +0800</pubDate>
<description>MDLMs generate text by denoising a preallocated masked response canvas, making response-length modeling central to instruction tuning. Existing MDLMs often inherit the autoregressive convention of using repeated \texttt{[EOS]} tokens for padding during instruction tuning, giving \texttt{[EOS]} a dual role as both a semantic terminator and a padding token. We show that this dual role is a root cause of \texttt{[EOS]} overflow under large-block decoding. To decouple these roles, we propose VoidPa…</description>
</item>
<item>
<title>Exploring Adversarial Robustness and Safety Alignment in Multilingual Multi-Modal Large Language Models</title>
<link>../papers/arxiv-4c2a3c128e72.html</link>
<guid>https://arxiv.org/abs/2606.03793#2026-06-03#instruction-tuning</guid>
<pubDate>Wed, 03 Jun 2026 14:09:56 +0800</pubDate>
<description>Multimodal Large Language Models integrate visual perception into language reasoning, introducing a continuous attack surface susceptible to adversarial attacks. Prior work on MLLM robustness has focused largely on English-centric tasks, leaving multilingual behaviour unexplored. We address this gap through a systematic study of adversarial robustness and multimodal safety across 12 diverse languages, evaluating open-source MLLMs that acquire multilingual capability through instruction tuning.…</description>
</item>
<item>
<title>Large Language Models Are Overconfident in Their Own Responses</title>
<link>../papers/arxiv-4a465132f62b.html</link>
<guid>https://arxiv.org/abs/2606.03437#2026-06-03#instruction-tuning</guid>
<pubDate>Wed, 03 Jun 2026 14:09:56 +0800</pubDate>
<description>Prior work has shown that instruction-tuned large language models (LLMs) are less well calibrated than their base pre-trained counterparts. However, little is known about the frequently used chat template&#x27;s effect on the calibration of conversational LLMs. In this work, we investigate the mechanisms driving this miscalibration by decoupling the effects of the post-training algorithm and the chat format. We find that, while instruction tuning fundamentally harms calibration, the chat template ag…</description>
</item>
<item>
<title>ProtoAda: Prototype-Guided Adaptive Adapter Expansion and Geometric Consolidation for Multimodal Continual Instruction Tuning</title>
<link>../papers/arxiv-5c9835ddf3a1.html</link>
<guid>https://arxiv.org/abs/2606.02576v1#2026-06-02#instruction-tuning</guid>
<pubDate>Tue, 02 Jun 2026 13:56:35 +0800</pubDate>
<description>Multimodal Large Language Models (MLLMs) achieve strong performance through instruction tuning, but real-world deployment requires them to continually acquire new vision-language capabilities, making Multimodal Continual Instruction Tuning (MCIT) essential. To reduce inter-task interference and promote collaboration, recent methods often employ sparse architectures like Mixture of LoRA Experts with image-text similarity routing. However, tasks with distinct response structures could share highl…</description>
</item>
<item>
<title>MAGIC: Multimodal Alignment &amp; Grounding-aware Instruction Coreset for Vision-Language Models</title>
<link>../papers/arxiv-36b948aa0972.html</link>
<guid>https://arxiv.org/abs/2605.26004v1#2026-05-26#instruction-tuning</guid>
<pubDate>Tue, 26 May 2026 13:09:24 +0800</pubDate>
<description>Instruction tuning of large vision-language models (LVLMs) increasingly depends on massive multimodal corpora, yet these datasets contain samples with substantial redundancy, low visual dependency, and highly imbalanced coverage of multimodal reasoning behaviors. As a result, uniform subsampling or naive score-based selection often yields suboptimal training subsets. We introduce MAGIC, a training-free, forward-only coreset selection method designed to construct compact yet behaviorally faithfu…</description>
</item>
<item>
<title>Dynamic Cross-Modal Prompt Generation for Multimodal Continual Instruction Tuning</title>
<link>../papers/arxiv-db5a48f1e422.html</link>
<guid>https://arxiv.org/abs/2605.10765v1#2026-05-12#instruction-tuning</guid>
<pubDate>Tue, 12 May 2026 12:42:08 +0800</pubDate>
<description>Multimodal Large Language Models (MLLMs) achieve strong performance through instruction tuning, yet real-world deployment often requires continual capability expansion across sequential tasks. In such scenarios, Multimodal Continual Instruction Tuning (MCIT) aims to acquire new capabilities while limiting catastrophic forgetting. Existing methods mainly follow a module-composition paradigm: they maintain task-level prompts or LoRA experts and dynamically route or aggregate a subset of them at i…</description>
</item>
<item>
<title>MAny: Merge Anything for Multimodal Continual Instruction Tuning</title>
<link>../papers/arxiv-b488936a3be9.html</link>
<guid>https://arxiv.org/abs/2604.14016v1#2026-04-16#instruction-tuning</guid>
<pubDate>Thu, 16 Apr 2026 11:43:00 +0800</pubDate>
<description>Multimodal Continual Instruction Tuning (MCIT) is essential for sequential task adaptation of Multimodal Large Language Models (MLLMs) but is severely restricted by catastrophic forgetting. While existing literature focuses on the reasoning language backbone, in this work, we expose a critical yet neglected dual-forgetting phenomenon across both perception drift in Cross-modal Projection Space and reasoning collapse in Low-rank Parameter Space. To resolve this, we present \textbf{MAny} (\textbf…</description>
</item>
</channel>
</rss>
