Keyword Tracking

关键词追踪：diffusion

这个页面会长期追踪你配置里关心的关键词，并把命中的论文按日期沉淀下来。

近期走势

最近一次命中来自 Vision：Diff-SBSR: Learning Multimodal Feature-Enhanced Diffusion Models for Zero-Shot Sketch-Based 3D Shape Retrieval

2026-04-09

2026-04-10

2026-04-11

2026-04-12

2026-04-13

2026-04-14

2026-04-15

2026-04-16

2026-04-17

2026-04-18

2026-04-19

2026-04-20

2026-04-21

2026-04-22

命中明细

按日期回看匹配到这个关键词的论文标题，并保留来源 feed 信息。

2026-04-22

2026-04-22 11:37:03 (Asia/Shanghai)

Vision

Diff-SBSR: Learning Multimodal Feature-Enhanced Diffusion Models for Zero-Shot Sketch-Based 3D Shape Retrieval

查看原始来源

This paper presents the first exploration of text-to-image diffusion models for zero-shot sketch-based 3D shape retrieval (ZS-SBSR). Existing sketch-based 3D shape retrieval metho…

Vision

ReImagine: Rethinking Controllable High-Quality Human Video Generation via Image-First Synthesis

查看原始来源

Human video generation remains challenging due to the difficulty of jointly modeling human appearance, motion, and camera viewpoint under limited multi-view data. Existing methods…

Vision

MMControl: Unified Multi-Modal Control for Joint Audio-Video Generation

查看原始来源

Recent advances in Diffusion Transformers (DiTs) have enabled high-quality joint audio-video generation, producing videos with synchronized audio within a single model. However, e…

Vision

MedFlowSeg: Flow Matching for Medical Image Segmentation with Frequency-Aware Attention

查看原始来源

Flow matching has recently emerged as a principled framework for learning continuous-time transport maps, enabling efficient deterministic generation without relying on stochastic…

Vision

RF-HiT: Rectified Flow Hierarchical Transformer for General Medical Image Segmentation

查看原始来源

Accurate medical image segmentation requires both long-range contextual reasoning and precise boundary delineation, a task where existing transformer- and diffusion-based paradigm…

Vision

EgoMotion: Hierarchical Reasoning and Diffusion for Egocentric Vision-Language Motion Generation

查看原始来源

Faithfully modeling human behavior in dynamic environments is a foundational challenge for embodied intelligence. While conditional motion synthesis has achieved significant advan…

Vision

AnyRecon: Arbitrary-View 3D Reconstruction with Video Diffusion Model

查看原始来源

Sparse-view 3D reconstruction is essential for modeling scenes from casual captures, but remain challenging for non-generative reconstruction. Existing diffusion-based approaches…

2026-04-21

2026-04-21 11:40:46 (Asia/Shanghai)

Vision

AdaCluster: Adaptive Query-Key Clustering for Sparse Attention in Video Generation

查看原始来源

Video diffusion transformers (DiTs) suffer from prohibitive inference latency due to quadratic attention complexity. Existing sparse attention methods either overlook semantic sim…

Vision

DiffuSAM: Diffusion Guided Zero-Shot Object Grounding for Remote Sensing Imagery

查看原始来源

Diffusion models have emerged as powerful tools for a wide range of vision tasks, including text-guided image generation and editing. In this work, we explore their potential for…

Vision

UDM-GRPO: Stable and Efficient Group Relative Policy Optimization for Uniform Discrete Diffusion Models

查看原始来源

Uniform Discrete Diffusion Model (UDM) has recently emerged as a promising paradigm for discrete generative modeling; however, its integration with reinforcement learning remains…

Vision

One-Step Diffusion with Inverse Residual Fields for Unsupervised Industrial Anomaly Detection

查看原始来源

Diffusion models have achieved outstanding performance in unsupervised industrial anomaly detection (uIAD) by learning a manifold of normal data under the common assumption that o…

Vision

Denoise and Align: Diffusion-Driven Foreground Knowledge Prompting for Open-Vocabulary Temporal Action Detection

查看原始来源

Open-Vocabulary Temporal Action Detection (OV-TAD) aims to localize and classify action segments of unseen categories in untrimmed videos, where effective alignment between action…

2026-04-17

2026-04-17 11:39:21 (Asia/Shanghai)

Vision

An Analysis of Regularization and Fokker-Planck Residuals in Diffusion Models for Image Generation

查看原始来源

Recent work has shown that diffusion models trained with the denoising score matching (DSM) objective often violate the Fokker--Planck (FP) equation that governs the evolution of…

Vision

RAD-2: Scaling Reinforcement Learning in a Generator-Discriminator Framework

查看原始来源

High-level autonomous driving requires motion planners capable of modeling multimodal future uncertainties while remaining robust in closed-loop interactions. Although diffusion-b…

2026-04-16

2026-04-16 11:43:00 (Asia/Shanghai)

Vision

Free Lunch for Unified Multimodal Models: Enhancing Generation via Reflective Rectification with Inherent Understanding

查看原始来源

Unified Multimodal Models (UMMs) aim to integrate visual understanding and generation within a single structure. However, these models exhibit a notable capability mismatch, where…

Vision

DiT as Real-Time Rerenderer: Streaming Video Stylization with Autoregressive Diffusion Transformer

查看原始来源

Recent advances in video generation models has significantly accelerated video generation and related downstream tasks. Among these, video stylization holds important research val…

Vision

Remote Sensing Image Super-Resolution for Imbalanced Textures: A Texture-Aware Diffusion Framework

查看原始来源

Generative diffusion priors have recently achieved state-of-the-art performance in natural image super-resolution, demonstrating a powerful capability to synthesize photorealistic…

Vision

Blind Bitstream-corrupted Video Recovery via Metadata-guided Diffusion Model

查看原始来源

Bitstream-corrupted video recovery aims to restore realistic content degraded during video storage or transmission. Existing methods typically assume that predefined masks of corr…

2026-04-15

2026-04-15 11:35:50 (Asia/Shanghai)

Vision

AbdomenGen: Sequential Volume-Conditioned Diffusion Framework for Abdominal Anatomy Generation

查看原始来源

Computational phantoms are widely used in medical imaging research, yet current systems to generate controlled, clinically meaningful anatomical variations remain limited. We pres…

Vision

Generative Refinement Networks for Visual Synthesis

查看原始来源

While diffusion models dominate the field of visual generation, they are computationally inefficient, applying a uniform computational effort regardless of different complexity. I…

Vision

Fragile Reconstruction: Adversarial Vulnerability of Reconstruction-Based Detectors for Diffusion-Generated Images

查看原始来源

Recently, detecting AI-generated images produced by diffusion-based models has attracted increasing attention due to their potential threat to safety. Among existing approaches, r…

PubMed AI

VLBiasBench: A Comprehensive Benchmark for Evaluating Bias in Large Vision-Language Model.

查看原始来源

The emergence of Large Vision-Language Models (LVLMs) marks significant strides towards achieving general artificial intelligence. However, these advancements are accompanied by c…

2026-04-14

2026-04-14 11:37:06 (Asia/Shanghai)

LLM

Multi-ORFT: Stable Online Reinforcement Fine-Tuning for Multi-Agent Diffusion Planning in Cooperative Driving

查看原始来源

Closed-loop cooperative driving requires planners that generate realistic multimodal multi-agent trajectories while improving safety and traffic efficiency. Existing diffusion pla…

Vision

Anthropogenic Regional Adaptation in Multimodal Vision-Language Model

查看原始来源

While the field of vision-language (VL) has achieved remarkable success in integrating visual and textual information across multiple languages and domains, there is still no dedi…

Vision

GazeVaLM: A Multi-Observer Eye-Tracking Benchmark for Evaluating Clinical Realism in AI-Generated X-Rays

查看原始来源

We introduce GazeVaLM, a public eye-tracking dataset for studying clinical perception during chest radiograph authenticity assessment. The dataset comprises 960 gaze recordings fr…

Vision

Progressively Texture-Aware Diffusion for Contrast-Enhanced Sparse-View CT

查看原始来源

Diffusion-based sparse-view CT (SVCT) imaging has achieved remarkable advancements in recent years, thanks to its more stable generative capability. However, recovering reliable i…

2026-04-08

2026-04-08 17:10:24 (Asia/Shanghai)

Vision

DiffHDR: Re-Exposing LDR Videos with Video Diffusion Models

查看原始来源

Most digital videos are stored in 8-bit low dynamic range (LDR) formats, where much of the original high dynamic range (HDR) scene radiance is lost due to saturation and quantizat…

Vision

SEM-ROVER: Semantic Voxel-Guided Diffusion for Large-Scale Driving Scene Generation

查看原始来源

Scalable generation of outdoor driving scenes requires 3D representations that remain consistent across multiple viewpoints and scale to large areas. Existing solutions either rel…

Vision

HumANDiff: Articulated Noise Diffusion for Motion-Consistent Human Video Generation

查看原始来源

Despite tremendous recent progress in human video generation, generative video diffusion models still struggle to capture the dynamics and physics of human motions faithfully. In…