๐Ÿ˜ƒ Hi!

I am Yujie Wei (ๅซๆ˜ฑๆฐ), a fourth-year Ph.D. student at Fudan University, advised by Prof. Hongming Shan. I received my Bachelorโ€™s degree in Software Engineering from Sichuan University, advised by Prof. Yi Zhang.

My research interests include 2D/3D Generative Models and Representation Learning, with a particular focus on:

  • Video and Image Generation (specifically on Customization and Controllable Generation)

  • Foundation Model Architecture

  • Self-Supervised Learning

๐Ÿ”ฅ News

  • 2026.01: ๐ŸŽ‰ ProMoE is accepted by ICLR 2026.
  • 2025.09: ๐ŸŽ‰ RepLDM, TTS-VAR are accepted by NeurIPS 2025. Honored to collaborate with them on these promising projects.
  • 2025.06: ๐ŸŽ‰ DreamRelation, FreeScale, PersonalVideo are accepted by ICCV 2025. Honored to collaborate with them on these promising projects.
  • 2025.03: ๐ŸŽ‰ TeaCache is accepted by CVPR 2025 Highlight. Congrats to Feng.
  • 2024.09: ๐ŸŽ‰ EvolveDirector is accepted by NeurIPS 2024. Congrats to Rui.
  • 2024.02: ๐ŸŽ‰ DreamVideo, InstructVideo, HiGen are accepted by CVPR 2024. Honored to collaborate with them on these promising projects.
  • 2023.08: ๐ŸŽ‰ Emo-DNA is accepted by ACM MM 2023. Congrats to Jiaxin.
  • 2023.07: ๐ŸŽ‰ OnPro is accepted by ICCV 2023.
  • 2023.02: ๐ŸŽ‰ Temporal Modeling Matters is accepted by ICASSP 2023. Congrats to Jiaxin.

๐Ÿ“ Publications

Selected Publications

ICLR 2026
sym

[Model Architecture] Routing Matters in MoE: Scaling Diffusion Transformers with Explicit Routing Guidance
Yujie Wei,โ€‰Shiwei Zhang, Hangjie Yuan, Yujin Han, Zhekai Chen, Jiayu Wang, Difan Zou, Xihui Liu, Yingya Zhang, Yu Liu, Hongming Shan

  • ProMoE is an MoE framework featuring a two-step router with explicit routing guidance that promotes expert specialization.
ICCV 2025
sym

[Video Generation] DreamRelation: Relation-Centric Video Customization
Yujie Wei,โ€‰Shiwei Zhang, Hangjie Yuan, Biao Gong, Longxiang Tang, Xiang Wang, Haonan Qiu, Hengjia Li, Shuai Tan, Yingya Zhang, Hongming Shan

[Project page]

  • DreamRelation is the first relational video customization method that personalizes user-specified relations.
CVPR 2024
sym

[Video Generation] DreamVideo: Composing Your Dream Videos with Customized Subject and Motion
Yujie Wei, Shiwei Zhang, Zhiwu Qing, Hangjie Yuan, Zhiheng Liu, Yu Liu, Yingya Zhang, Jingren Zhou, Hongming Shan

GitHub Stars GitHub Forks [Project page]

  • DreamVideo is the first method that generates customized videos from a few static images of the desired subject and a few videos of target motion.
ICCV 2023
sym

[Continual Learning] Online Prototype Learning for Online Continual Learning
Yujie Wei, Jiaxin Ye, Zhizhong Huang, Junping Zhang, Hongming Shan

[Code]

  • OnPro is the first work to identify shortcut learning as the key limiting factor for online continual learning, offering new insights into why online learning models fail to generalize well.
Arxiv preprint
sym

[Video Generation] DreamVideo-2: Zero-Shot Subject-Driven Video Customization with Precise Motion Control
Yujie Wei, Shiwei Zhang, Hangjie Yuan, Xiang Wang, Haonan Qiu, Rui Zhao, Yutong Feng, Feng Liu, Zhizhong Huang, Jiaxin Ye, Yingya Zhang, Hongming Shan

[Project page]

  • DreamVideo-2 is the first zero-shot (tuning-free) framework that generates customized videos with specified subjects and motion trajectories.

Collaborative Publications

ICCV 2025
sym

FreeScale: Unleashing the Resolution of Diffusion Models via Tuning-Free Scale Fusion
Haonan Qiu, Shiwei Zhang, Yujie Wei, Ruihang Chu, Hangjie Yuan, Xiang Wang, Yingya Zhang, Ziwei Liu

[Project page] [Code]

  • FreeScale proposes a tuning-free inference paradigm to enable higher-resolution visual generation via scale fusion.
CVPR 2025 Highlight
sym

Timestep Embedding Tells: Itโ€™s Time to Cache for Video Diffusion Model
Feng Liu, Shiwei Zhang, Xiaofeng Wang, Yujie Wei, Haonan Qiu, Yuzhong Zhao, Yingya Zhang, Qixiang Ye, Fang Wan

[Project page] [Code]

  • TeaCache is a training-free caching approach that estimates and leverages the fluctuating differences among model outputs across timesteps.
NeurIPS 2024
sym

EvolveDirector: Approaching Advanced Text-to-Image Generation with Large Vision-Language Models
Rui Zhao, Hangjie Yuan, Yujie Wei, Shiwei Zhang, Yuchao Gu, Lingmin Ran, Xiang Wang, Zhangjie Wu, Junhao Zhang, Yingya Zhang, Mike Zheng Shou

[Code]

  • EvolveDirector explores the feasibility of training a text-to-image generation model comparable to advanced models using publicly available resources.
CVPR 2024
sym

InstructVideo: Instructing Video Diffusion Models with Human Feedback
Hangjie Yuan, Shiwei Zhang, Xiang Wang, Yujie Wei, Tao Feng, Yining Pan, Yingya Zhang, Ziwei Liu, Samuel Albanie, Dong Ni

GitHub Stars GitHub Forks [Project page]

  • InstructVideo is the first research attempt that instructs video diffusion models with human feedback.
CVPR 2024
sym

Hierarchical Spatio-Temporal Decoupling for Text-to-Video Generation
Zhiwu Qing, Shiwei Zhang, Jiayu Wang, Xiang Wang, Yujie Wei, Yingya Zhang, Changxin Gao, Nong Sang

GitHub Stars GitHub Forks [Project page]

  • HiGen is a method that improves T2V performance by decoupling the spatial and temporal factors from the structure and content level.

๐ŸŽ– Honors and Awards

  • 2022.09 Fudan University Zhicheng Freshman Second Prize Scholarship (Top 5%)
  • 2022.05 Outstanding Graduates of Sichuan Province and Sichuan University
  • 2021.10 National Scholarship (Top 1%)
  • 2020.10 The First Prize Scholarship (Top 3%)
  • 2020.05 Sichuan University Top 100 Student Leaders
  • 2019.10 National Scholarship (Top 1%)

๐ŸŽ“ Academic Service

  • Conference Reviewer: ICLR, CVPR, ICCV, ACM MM, NeurIPS, SIGGRAPH Asia, ICML.
  • Journal Reviewer: TPAMI, TIP, Information Fusion.

๐Ÿ’ฌ Invited Talks

  • 2024.11, Customized Image & Video Generation, 3D่ง†่ง‰ๅทฅๅŠ & 3DCV & ่ฎก็ฎ—ๆœบ่ง†่ง‰ๅทฅๅŠ | [Link] | [Video]

๐Ÿ“– Educations

  • 2022.09 - 2027.06 (now), Ph.D., Fudan University, Shanghai, China.
  • 2018.09 - 2022.06, Bachelor of Software Engineering, Sichuan Univeristy, Chengdu, China.