<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0">
<channel>
<title>segmentation Topic Archive</title>
<link>segmentation.html</link>
<description>关键词 segmentation 的长期追踪 RSS，汇总历史命中文献。</description>
<language>zh-CN</language>
<lastBuildDate>Wed, 22 Apr 2026 03:37:20 +0000</lastBuildDate>
<item>
<title>PanDA: Unsupervised Domain Adaptation for Multimodal 3D Panoptic Segmentation in Autonomous Driving</title>
<link>../papers/arxiv-523377f05ec5.html</link>
<guid>https://arxiv.org/abs/2604.19379v1#2026-04-22#segmentation</guid>
<pubDate>Wed, 22 Apr 2026 11:37:03 +0800</pubDate>
<description>This paper presents the first study on Unsupervised Domain Adaptation (UDA) for multimodal 3D panoptic segmentation (mm-3DPS), aiming to improve generalization under domain shifts commonly encountered in real-world autonomous driving. A straightforward solution is to employ a pseudo-labeling strategy, which is widely used in UDA to generate supervision for unlabeled target data, combined with an mm-3DPS backbone. However, existing supervised mm-3DPS methods rely heavily on strong cross-modal co…</description>
</item>
<item>
<title>MedFlowSeg: Flow Matching for Medical Image Segmentation with Frequency-Aware Attention</title>
<link>../papers/arxiv-7e457ae682db.html</link>
<guid>https://arxiv.org/abs/2604.19675v1#2026-04-22#segmentation</guid>
<pubDate>Wed, 22 Apr 2026 11:37:03 +0800</pubDate>
<description>Flow matching has recently emerged as a principled framework for learning continuous-time transport maps, enabling efficient deterministic generation without relying on stochastic diffusion processes. While generative modeling has shown promise for medical image segmentation, particularly in capturing uncertainty and complex anatomical variability, existing approaches are predominantly built upon diffusion models, which incur substantial computational overhead due to iterative sampling and are…</description>
</item>
<item>
<title>RF-HiT: Rectified Flow Hierarchical Transformer for General Medical Image Segmentation</title>
<link>../papers/arxiv-ab2855fef1f1.html</link>
<guid>https://arxiv.org/abs/2604.19570v1#2026-04-22#segmentation</guid>
<pubDate>Wed, 22 Apr 2026 11:37:03 +0800</pubDate>
<description>Accurate medical image segmentation requires both long-range contextual reasoning and precise boundary delineation, a task where existing transformer- and diffusion-based paradigms are frequently bottlenecked by quadratic computational complexity and prohibitive inference latency. We propose RF-HiT, a Rectified Flow Hierarchical Transformer that integrates an hourglass transformer backbone with a multi-scale hierarchical encoder for anatomically guided feature conditioning. Unlike prior diffusi…</description>
</item>
<item>
<title>DiffuSAM: Diffusion Guided Zero-Shot Object Grounding for Remote Sensing Imagery</title>
<link>../papers/arxiv-18b224b59dda.html</link>
<guid>https://arxiv.org/abs/2604.18201v1#2026-04-21#segmentation</guid>
<pubDate>Tue, 21 Apr 2026 11:40:46 +0800</pubDate>
<description>Diffusion models have emerged as powerful tools for a wide range of vision tasks, including text-guided image generation and editing. In this work, we explore their potential for object grounding in remote sensing imagery. We propose a hybrid pipeline that integrates diffusion-based localization cues with state-of-the-art segmentation models such as RemoteSAM and SAM3 to obtain more accurate bounding boxes. By leveraging the complementary strengths of generative diffusion models and foundationa…</description>
</item>
<item>
<title>Weakly-Supervised Referring Video Object Segmentation through Text Supervision</title>
<link>../papers/arxiv-ccd0dd55c2f1.html</link>
<guid>https://arxiv.org/abs/2604.17797v1#2026-04-21#segmentation</guid>
<pubDate>Tue, 21 Apr 2026 11:40:46 +0800</pubDate>
<description>Referring video object segmentation (RVOS) aims to segment the target instance in a video, referred by a text expression. Conventional approaches are mostly supervised learning, requiring expensive pixel-level mask annotations. To tackle it, weakly-supervised RVOS has recently been proposed to replace mask annotations with bounding boxes or points, which are however still costly and labor-intensive. In this paper, we design a novel weakly-supervised RVOS method, namely WSRVOS, to train the mode…</description>
</item>
<item>
<title>AnchorSeg: Language Grounded Query Banks for Reasoning Segmentation</title>
<link>../papers/arxiv-28c2e2bc1523.html</link>
<guid>https://arxiv.org/abs/2604.18562v1#2026-04-21#segmentation</guid>
<pubDate>Tue, 21 Apr 2026 11:40:46 +0800</pubDate>
<description>Reasoning segmentation requires models to ground complex, implicit textual queries into precise pixel-level masks. Existing approaches rely on a single segmentation token $\texttt{&lt;SEG&gt;}$, whose hidden state implicitly encodes both semantic reasoning and spatial localization, limiting the model&#x27;s ability to explicitly disentangle what to segment from where to segment. We introduce AnchorSeg, which reformulates reasoning segmentation as a structured conditional generation process over image toke…</description>
</item>
<item>
<title>DSA-CycleGAN: A Domain Shift Aware CycleGAN for Robust Multi-Stain Glomeruli Segmentation</title>
<link>../papers/arxiv-d4a5ecc75743.html</link>
<guid>https://arxiv.org/abs/2604.18368v1#2026-04-21#segmentation</guid>
<pubDate>Tue, 21 Apr 2026 11:40:46 +0800</pubDate>
<description>A key challenge in segmentation in digital histopathology is inter- and intra-stain variations as it reduces model performance. Labelling each stain is expensive and time-consuming so methods using stain transfer via CycleGAN, have been developed for training multi-stain segmentation models using labels from a single stain. Nevertheless, CycleGAN tends to introduce noise during translation because of the one-to-many nature of some stain pairs, which conflicts with its cycle consistency loss. To…</description>
</item>
<item>
<title>SegWithU: Uncertainty as Perturbation Energy for Single-Forward-Pass Risk-Aware Medical Image Segmentation</title>
<link>../papers/arxiv-23036fba0e62.html</link>
<guid>https://arxiv.org/abs/2604.15271v1#2026-04-17#segmentation</guid>
<pubDate>Fri, 17 Apr 2026 11:39:21 +0800</pubDate>
<description>Reliable uncertainty estimation is critical for medical image segmentation, where automated contours feed downstream quantification and clinical decision support. Many strong uncertainty methods require repeated inference, while efficient single-forward-pass alternatives often provide weaker failure ranking or rely on restrictive feature-space assumptions. We present $\textbf{SegWithU}$, a post-hoc framework that augments a frozen pretrained segmentation backbone with a lightweight uncertainty…</description>
</item>
<item>
<title>Unsupervised Skeleton-Based Action Segmentation via Hierarchical Spatiotemporal Vector Quantization</title>
<link>../papers/arxiv-5879454db7c6.html</link>
<guid>https://arxiv.org/abs/2604.15196v1#2026-04-17#segmentation</guid>
<pubDate>Fri, 17 Apr 2026 11:39:21 +0800</pubDate>
<description>We propose a novel hierarchical spatiotemporal vector quantization framework for unsupervised skeleton-based temporal action segmentation. We first introduce a hierarchical approach, which includes two consecutive levels of vector quantization. Specifically, the lower level associates skeletons with fine-grained subactions, while the higher level further aggregates subactions into action-level representations. Our hierarchical approach outperforms the non-hierarchical baseline, while primarily…</description>
</item>
<item>
<title>Boundary-Centric Active Learning for Temporal Action Segmentation</title>
<link>../papers/arxiv-d84f4cfc7217.html</link>
<guid>https://arxiv.org/abs/2604.15173v1#2026-04-17#segmentation</guid>
<pubDate>Fri, 17 Apr 2026 11:39:21 +0800</pubDate>
<description>Temporal action segmentation (TAS) demands dense temporal supervision, yet most of the annotation cost in untrimmed videos is spent identifying and refining action transitions, where segmentation errors concentrate and small temporal shifts disproportionately degrade segmental metrics. We introduce B-ACT, a clip-budgeted active learning framework that explicitly allocates supervision to these high-leverage boundary regions. B-ACT operates in a hierarchical two-stage loop: (i) it ranks and queri…</description>
</item>
<item>
<title>Efficient Search of Implantable Adaptive Cells for Medical Image Segmentation</title>
<link>../papers/arxiv-88d2221df05a.html</link>
<guid>https://arxiv.org/abs/2604.14849v1#2026-04-17#segmentation</guid>
<pubDate>Fri, 17 Apr 2026 11:39:21 +0800</pubDate>
<description>Purpose: Adaptive skip modules can improve medical image segmentation, but searching for them is computationally costly. Implantable Adaptive Cells (IACs) are compact NAS modules inserted into U-Net skip connections, reducing the search space compared with full-network NAS. However, the original IAC framework still requires a 200-epoch differentiable search for each backbone and dataset. Methods: We analyzed the temporal behavior of operations and edges within IAC cells during differentiable se…</description>
</item>
<item>
<title>From Boundaries to Semantics: Prompt-Guided Multi-Task Learning for Petrographic Thin-section Segmentation</title>
<link>../papers/arxiv-0f86f7993414.html</link>
<guid>https://arxiv.org/abs/2604.14805v1#2026-04-17#segmentation</guid>
<pubDate>Fri, 17 Apr 2026 11:39:21 +0800</pubDate>
<description>Grain-edge segmentation (GES) and lithology semantic segmentation (LSS) are two pivotal tasks for quantifying rock fabric and composition. However, these two tasks are often treated separately, and the segmentation quality is implausible albeit expensive, time-consuming, and expert-annotated datasets have been used. Recently, foundation models, especially the Segment Anything Model (SAM), have demonstrated impressive robustness for boundary alignment. However, directly adapting SAM to joint GES…</description>
</item>
<item>
<title>From Image to Pixels: towards Fine-Grained Medical Vision-Language Models.</title>
<link>../papers/doi-71303bb82f13.html</link>
<guid>https://pubmed.ncbi.nlm.nih.gov/41989909/#2026-04-17#segmentation</guid>
<pubDate>Fri, 17 Apr 2026 11:39:21 +0800</pubDate>
<description>Multimodal large language models (MLLMs) offer immense potential for biomedical AI, yet current applications remain limited to coarse-grained image understanding and basic textual queries-falling short of the fine-grained reasoning required in clinical contexts. In this work, we present a comprehensive solution spanning data, model, and training innovations to advance pixel-level multimodal intelligence in biomedicine. First, we construct MeCoVQA, a new visual-language benchmark that spans eigh…</description>
</item>
<item>
<title>ROSE: Retrieval-Oriented Segmentation Enhancement</title>
<link>../papers/arxiv-e008501b0fb5.html</link>
<guid>https://arxiv.org/abs/2604.14147v1#2026-04-16#segmentation</guid>
<pubDate>Thu, 16 Apr 2026 11:43:00 +0800</pubDate>
<description>Existing segmentation models based on multimodal large language models (MLLMs), such as LISA, often struggle with novel or emerging entities due to their inability to incorporate up-to-date knowledge. To address this challenge, we introduce the Novel Emerging Segmentation Task (NEST), which focuses on segmenting (i) novel entities that MLLMs fail to recognize due to their absence from training data, and (ii) emerging entities that exist within the model&#x27;s knowledge but demand up-to-date externa…</description>
</item>
<item>
<title>Decoding the Delta: Unifying Remote Sensing Change Detection and Understanding with Multimodal Large Language Models</title>
<link>../papers/arxiv-edb7485d7898.html</link>
<guid>https://arxiv.org/abs/2604.14044v1#2026-04-16#segmentation</guid>
<pubDate>Thu, 16 Apr 2026 11:43:00 +0800</pubDate>
<description>While Multimodal Large Language Models (MLLMs) excel in general vision-language tasks, their application to remote sensing change understanding is hindered by a fundamental &quot;temporal blindness&quot;. Existing architectures lack intrinsic mechanisms for multi-temporal contrastive reasoning and struggle with precise spatial grounding. To address this, we first introduce Delta-QA, a comprehensive benchmark comprising 180k visual question-answering samples. Delta-QA unifies pixel-level segmentation and…</description>
</item>
<item>
<title>PBE-UNet: A light weight Progressive Boundary-Enhanced U-Net with Scale-Aware Aggregation for Ultrasound Image Segmentation</title>
<link>../papers/arxiv-f0b69cb6a500.html</link>
<guid>https://arxiv.org/abs/2604.13791v1#2026-04-16#segmentation</guid>
<pubDate>Thu, 16 Apr 2026 11:43:00 +0800</pubDate>
<description>Accurate lesion segmentation in ultrasound images is essential for preventive screening and clinical diagnosis, yet remains challenging due to low contrast, blurry boundaries, and significant scale variations. Although existing deep learning-based methods have achieved remarkable performance, these methods still struggle with scale variations and indistinct tumor boundaries. To address these challenges, we propose a progressive boundary enhanced U-Net (PBE-UNet). Specially, we first introduce a…</description>
</item>
<item>
<title>Design and Behavior of Sparse Mixture-of-Experts Layers in CNN-based Semantic Segmentation</title>
<link>../papers/arxiv-1315c3054cdc.html</link>
<guid>https://arxiv.org/abs/2604.13761v1#2026-04-16#segmentation</guid>
<pubDate>Thu, 16 Apr 2026 11:43:00 +0800</pubDate>
<description>Sparse mixture-of-experts (MoE) layers have been shown to substantially increase model capacity without a proportional increase in computational cost and are widely used in transformer architectures, where they typically replace feed-forward network blocks. In contrast, integrating sparse MoE layers into convolutional neural networks (CNNs) remains inconsistent, with most prior work focusing on fine-grained MoEs operating at the filter or channel levels. In this work, we investigate a coarser,…</description>
</item>
<item>
<title>RSGMamba: Reliability-Aware Self-Gated State Space Model for Multimodal Semantic Segmentation</title>
<link>../papers/arxiv-3027c60dff03.html</link>
<guid>https://arxiv.org/abs/2604.12319v1#2026-04-15#segmentation</guid>
<pubDate>Wed, 15 Apr 2026 11:35:50 +0800</pubDate>
<description>Multimodal semantic segmentation has emerged as a powerful paradigm for enhancing scene understanding by leveraging complementary information from multiple sensing modalities (e.g., RGB, depth, and thermal). However, existing cross-modal fusion methods often implicitly assume that all modalities are equally reliable, which can lead to feature degradation when auxiliary modalities are noisy, misaligned, or incomplete. In this paper, we revisit cross-modal fusion from the perspective of modality…</description>
</item>
<item>
<title>All in One: A Unified Synthetic Data Pipeline for Multimodal Video Understanding</title>
<link>../papers/arxiv-ba711ee91078.html</link>
<guid>https://arxiv.org/abs/2604.12335v1#2026-04-15#segmentation</guid>
<pubDate>Wed, 15 Apr 2026 11:35:50 +0800</pubDate>
<description>Training multimodal large language models (MLLMs) for video understanding requires large-scale annotated data spanning diverse tasks such as object counting, question answering, and segmentation. However, collecting and annotating multimodal video data in real-world is costly, slow, and inherently limited in diversity and coverage. To address this challenge, we propose a unified synthetic data generation pipeline capable of automatically producing unlimited multimodal video data with rich and d…</description>
</item>
<item>
<title>Radar-Camera BEV Multi-Task Learning with Cross-Task Attention Bridge for Joint 3D Detection and Segmentation</title>
<link>../papers/arxiv-f59d4565b4bc.html</link>
<guid>https://arxiv.org/abs/2604.12918v1#2026-04-15#segmentation</guid>
<pubDate>Wed, 15 Apr 2026 11:35:50 +0800</pubDate>
<description>Bird&#x27;s-eye-view (BEV) representations are the dominant paradigm for 3D perception in autonomous driving, providing a unified spatial canvas where detection and segmentation features are geometrically registered to the same physical coordinate system. However, existing radar-camera fusion methods treat these tasks in isolation, missing the opportunity to share complementary information between them: detection features encode object-level geometry that can sharpen segmentation boundaries, while s…</description>
</item>
<item>
<title>Detecting and refurbishing ground truth errors during training of deep learning-based echocardiography segmentation models</title>
<link>../papers/arxiv-3b9fc3f09edf.html</link>
<guid>https://arxiv.org/abs/2604.12832v1#2026-04-15#segmentation</guid>
<pubDate>Wed, 15 Apr 2026 11:35:50 +0800</pubDate>
<description>Deep learning-based medical image segmentation typically relies on ground truth (GT) labels obtained through manual annotation, but these can be prone to random errors or systematic biases. This study examines the robustness of deep learning models to such errors in echocardiography (echo) segmentation and evaluates a novel strategy for detecting and refurbishing erroneous labels during model training. Using the CAMUS dataset, we simulate three error types, then compare a loss-based GT label er…</description>
</item>
<item>
<title>LMMs Meet Object-Centric Vision: Understanding, Segmentation, Editing and Generation</title>
<link>../papers/arxiv-be45283d75a9.html</link>
<guid>https://arxiv.org/abs/2604.11789v1#2026-04-14#segmentation</guid>
<pubDate>Tue, 14 Apr 2026 11:37:06 +0800</pubDate>
<description>Large Multimodal Models (LMMs) have achieved remarkable progress in general-purpose vision--language understanding, yet they remain limited in tasks requiring precise object-level grounding, fine-grained spatial reasoning, and controllable visual manipulation. In particular, existing systems often struggle to identify the correct instance, preserve object identity across interactions, and localize or modify designated regions with high precision. Object-centric vision provides a principled fram…</description>
</item>
<item>
<title>GeomPrompt: Geometric Prompt Learning for RGB-D Semantic Segmentation Under Missing and Degraded Depth</title>
<link>../papers/arxiv-a58e7d937629.html</link>
<guid>https://arxiv.org/abs/2604.11585v1#2026-04-14#segmentation</guid>
<pubDate>Tue, 14 Apr 2026 11:37:06 +0800</pubDate>
<description>Multimodal perception systems for robotics and embodied AI often assume reliable RGB-D sensing, but in practice, depth is frequently missing, noisy, or corrupted. We thus present GeomPrompt, a lightweight cross-modal adaptation module that synthesizes a task-driven geometric prompt from RGB alone for the fourth channel of a frozen RGB-D semantic segmentation model, without depth supervision. We further introduce GeomPrompt-Recovery, an adaptation module that compensates for degraded depth by pr…</description>
</item>
<item>
<title>Budget-Aware Uncertainty for Radiotherapy Segmentation QA Using nnU-Net</title>
<link>../papers/arxiv-dd7f6721d11f.html</link>
<guid>https://arxiv.org/abs/2604.11798v1#2026-04-14#segmentation</guid>
<pubDate>Tue, 14 Apr 2026 11:37:06 +0800</pubDate>
<description>Accurate delineation of the Clinical Target Volume (CTV) is essential for radiotherapy planning, yet remains time-consuming and difficult to assess, especially for complex treatments such as Total Marrow and Lymph Node Irradiation (TMLI). While deep learning-based auto-segmentation can reduce workload, safe clinical deployment requires reliable cues indicating where models may be wrong. In this work, we propose a budget-aware uncertainty-driven quality assurance (QA) framework built on nnU-Net,…</description>
</item>
<item>
<title>Efficient KernelSHAP Explanations for Patch-based 3D Medical Image Segmentation</title>
<link>../papers/arxiv-f426fe16d894.html</link>
<guid>https://arxiv.org/abs/2604.11775v1#2026-04-14#segmentation</guid>
<pubDate>Tue, 14 Apr 2026 11:37:06 +0800</pubDate>
<description>Perturbation-based explainability methods such as KernelSHAP provide model-agnostic attributions but are typically impractical for patch-based 3D medical image segmentation due to the large number of coalition evaluations and the high cost of sliding-window inference. We present an efficient KernelSHAP framework for volumetric CT segmentation that restricts computation to a user-defined region of interest and its receptive-field support, and accelerates inference via patch logit caching, reusin…</description>
</item>
<item>
<title>Seeing Through the Tool: A Controlled Benchmark for Occlusion Robustness in Foundation Segmentation Models</title>
<link>../papers/arxiv-850cced4e5aa.html</link>
<guid>https://arxiv.org/abs/2604.11711v1#2026-04-14#segmentation</guid>
<pubDate>Tue, 14 Apr 2026 11:37:06 +0800</pubDate>
<description>Occlusion, where target structures are partially hidden by surgical instruments or overlapping tissues, remains a critical yet underexplored challenge for foundation segmentation models in clinical endoscopy. We introduce OccSAM-Bench, a benchmark designed to systematically evaluate SAM-family models under controlled, synthesized surgical occlusion. Our framework simulates two occlusion types (i.e., surgical tool overlay and cutout) across three calibrated severity levels on three public polyp…</description>
</item>
<item>
<title>Text4Seg++: Advancing Image Segmentation via Generative Language Modeling.</title>
<link>../papers/doi-b67edb02c604.html</link>
<guid>https://pubmed.ncbi.nlm.nih.gov/41973591/#2026-04-14#segmentation</guid>
<pubDate>Tue, 14 Apr 2026 11:37:06 +0800</pubDate>
<description>Multimodal Large Language Models (MLLMs) have shown exceptional capabilities in vision-language tasks. However, effectively integrating image segmentation into these models remains a significant challenge. In this work, we propose a novel text-as-mask paradigm that casts image segmentation as a text generation problem, eliminating the need for additional decoders and significantly simplifying the segmentation process. Our key innovation is semantic descriptors, a new textual representation of s…</description>
</item>
<item>
<title>Multi-Modal Landslide Detection from Sentinel-1 SAR and Sentinel-2 Optical Imagery Using Multi-Encoder Vision Transformers and Ensemble Learning</title>
<link>../papers/arxiv-42e674d40a9d.html</link>
<guid>https://arxiv.org/abs/2604.05959v1#2026-04-08#segmentation</guid>
<pubDate>Wed, 08 Apr 2026 17:10:24 +0800</pubDate>
<description>Landslides represent a major geohazard with severe impacts on human life, infrastructure, and ecosystems, underscoring the need for accurate and timely detection approaches to support disaster risk reduction. This study proposes a modular, multi-model framework that fuses Sentinel-2 optical imagery with Sentinel-1 Synthetic Aperture Radar (SAR) data, for robust landslide detection. The methodology leverages multi-encoder vision transformers, where each data modality is processed through separat…</description>
</item>
</channel>
</rss>
