Keyword Tracking

关键词追踪：video generation

这个页面会长期追踪你配置里关心的关键词，并把命中的论文按日期沉淀下来。

返回归档首页查看趋势总览最新 JSON 订阅 RSS

近期走势

最近一次命中来自 Vision：ReImagine: Rethinking Controllable High-Quality Human Video Generation via Image-First Synthesis

2026-04-09

2026-04-10

2026-04-11

2026-04-12

2026-04-13

2026-04-14

2026-04-15

2026-04-16

2026-04-17

2026-04-18

2026-04-19

2026-04-20

2026-04-21

2026-04-22

命中明细

按日期回看匹配到这个关键词的论文标题，并保留来源 feed 信息。

2026-04-22

2026-04-22 11:37:03 (Asia/Shanghai)

Vision

ReImagine: Rethinking Controllable High-Quality Human Video Generation via Image-First Synthesis

查看原始来源

Human video generation remains challenging due to the difficulty of jointly modeling human appearance, motion, and camera viewpoint under limited multi-view data. Existing methods…

Vision

MMControl: Unified Multi-Modal Control for Joint Audio-Video Generation

查看原始来源

Recent advances in Diffusion Transformers (DiTs) have enabled high-quality joint audio-video generation, producing videos with synchronized audio within a single model. However, e…

Vision

How Far Are Video Models from True Multimodal Reasoning?

查看原始来源

Despite remarkable progress toward general-purpose video models, a critical question remains unanswered: how far are these models from achieving true multimodal reasoning? Existin…

Vision

CityRAG: Stepping Into a City via Spatially-Grounded Video Generation

查看原始来源

We address the problem of generating a 3D-consistent, navigable environment that is spatially grounded: a simulation of a real location. Existing video generative models can produ…

2026-04-21

2026-04-21 11:40:46 (Asia/Shanghai)

Vision

AdaCluster: Adaptive Query-Key Clustering for Sparse Attention in Video Generation

查看原始来源

Video diffusion transformers (DiTs) suffer from prohibitive inference latency due to quadratic attention complexity. Existing sparse attention methods either overlook semantic sim…

Vision

OmniHuman: A Large-scale Dataset and Benchmark for Human-Centric Video Generation

查看原始来源

Recent advancements in audio-video joint generation models have demonstrated impressive capabilities in content creation. However, generating high-fidelity human-centric videos in…

2026-04-17

2026-04-17 11:39:21 (Asia/Shanghai)

Vision

Flow of Truth: Proactive Temporal Forensics for Image-to-Video Generation

查看原始来源

The rapid rise of image-to-video (I2V) generation enables realistic videos to be created from a single image but also brings new forensic demands. Unlike static images, I2V conten…

2026-04-16

2026-04-16 11:43:00 (Asia/Shanghai)

Vision

DiT as Real-Time Rerenderer: Streaming Video Stylization with Autoregressive Diffusion Transformer

查看原始来源

Recent advances in video generation models has significantly accelerated video generation and related downstream tasks. Among these, video stylization holds important research val…

Vision

Seedance 2.0: Advancing Video Generation for World Complexity

查看原始来源

Seedance 2.0 is a new native multi-modal audio-video generation model, officially released in China in early February 2026. Compared with its predecessors, Seedance 1.0 and 1.5 Pr…

2026-04-15

2026-04-15 11:35:50 (Asia/Shanghai)

Vision

Generative Refinement Networks for Visual Synthesis

查看原始来源

While diffusion models dominate the field of visual generation, they are computationally inefficient, applying a uniform computational effort regardless of different complexity. I…

2026-04-14

2026-04-14 11:37:06 (Asia/Shanghai)

Vision

OmniShow: Unifying Multimodal Conditions for Human-Object Interaction Video Generation

查看原始来源

In this work, we study Human-Object Interaction Video Generation (HOIVG), which aims to synthesize high-quality human-object interaction videos conditioned on text, reference imag…

Vision

HDR Video Generation via Latent Alignment with Logarithmic Encoding

查看原始来源

High dynamic range (HDR) imagery offers a rich and faithful representation of scene radiance, but remains challenging for generative models due to its mismatch with the bounded, p…

2026-04-08

2026-04-08 17:10:24 (Asia/Shanghai)

Vision

Action Images: End-to-End Policy Learning via Multiview Video Generation

查看原始来源

World action models (WAMs) have emerged as a promising direction for robot policy learning, as they can leverage powerful video backbones to model the future states. However, exis…

Vision

OmniCamera: A Unified Framework for Multi-task Video Generation with Arbitrary Camera Control

查看原始来源

Video fundamentally intertwines two crucial axes: the dynamic content of a scene and the camera motion through which it is observed. However, existing generation models often enta…

Vision

HumANDiff: Articulated Noise Diffusion for Motion-Consistent Human Video Generation

查看原始来源

Despite tremendous recent progress in human video generation, generative video diffusion models still struggle to capture the dynamics and physics of human motions faithfully. In…