VGBE@CVPR2026

About the Workshop

The rapid advancement of video generative models underscores the critical need for robust evaluation methodologies capable of rigorously assessing instruction adherence, physical plausibility, human fidelity, and creativity. However, prevailing metrics and benchmarks remain constrained, predominantly prioritizing semantic alignment while often overlooking subtle yet critical artifacts, such as structural distortions, unnatural motion dynamics, and weak temporal coherence, that persist even in state-of-the-art systems.

Therefore, the VGBE workshop seeks to pioneer next-generation evaluation methodologies characterized by fine-grained granularity, physical grounding, and alignment with human perception. By establishing multi-dimensional, explainable, and standardized benchmarks, we aim to bridge the gap between generation and assessment, thereby accelerating the maturation of video generative models and facilitating their reliable deployment in real-world applications.

Topics of Interest

🏆 Awards: Our workshop will present a Best Paper Award and a Best Paper Runner-Up Award to recognize outstanding contributions.

We invite contributions on (but not limited to):

Novel Metrics and Evaluation Methods

Spatiotemporal & Causal Integrity: Quantifying motion realism, object permanence, and causal logic consistency over time.
Perceptual Quality Assessment: Learning-based metrics for detecting visual artifacts, hallucinations, and alignment with human subjectivity.
Explainable Automated Judges: Leveraging Multimodal LLMs (VLMs) for scalable, fine-grained, and interpretable critique.
Instruction Adherence Metrics: Rigorous evaluation of prompt fidelity, spatial conditioning, and complex constraint satisfaction.

Datasets and Benchmarks

Narrative & Multi-Shot Suites: Curated datasets assessing character persistence, scene transitions, and long-horizon consistency.
Physics-Grounded Challenge Sets: Scenarios isolating fluid dynamics, collisions, and kinematic anomalies to stress-test "World Simulators."
Human Preference Data: Large-scale, fine-grained annotations capturing multi-dimensional judgments (e.g., aesthetics vs. realism).
Standardized Protocols: Unified data splits and reproducible frameworks to ensure transparent and comparable benchmarking.

Developing video generative applications in vertical domains

Domain Adaptation & Personalization: Efficient fine-tuning and Low-Rank Adaptation (LoRA) strategies for specialized verticals (e.g., medical, cinematic).
Simulation for Embodied AI: Leveraging video generative models as world simulators for robotics perception, planning, and Sim2Real transfer.
Interactive & Human-in-the-Loop: User-centric frameworks incorporating iterative feedback for creative workflows and gaming.
Immersive 4D Generation: Lifting video diffusion priors to synthesize spatially consistent scenes and dynamic assets for AR/VR environments.
Deployment Efficiency: Optimizing inference latency, memory footprint, and cost for scalable industrial applications.

VGBE 2026 Challenges

Submissions will be evaluated on the test set using the metrics defined in the associated paper, with human evaluation conducted for each task as needed.

Image-to-Video Consistent Generation

Objective: Maintain visual preservation and spatiotemporal consistency from an image and text prompt.
Awards:
- 🏆 1st Place: $1,000 + Certificate
- 🥈 2nd Place: $600 + Certificate
- 🥉 3rd Place: $300 + Certificate
Data Usage: Please follow the Dataset License for data access and usage.

Participate Now

Competition Timeline

Competition starts	February 19, 2026
Results and Code Submission deadline	April 01, 2026

Text-conditioned General Video Editing

Objective: Edit input videos from natural language instructions while preserving quality and fidelity.
Awards:
- 🏆 Highest Score Award: $500 + Certificate
- 🌟 Innovation Award: $500 + Certificate

Competition Timeline

Competition starts	February 20, 2026
Results Submission deadline	March 25, 2026

Participate Now

Environment-aware Video Instance Removal

Objective: Remove target instances and restore realistic environment dynamics with minimal artifacts.
Awards:
- 🏆 Highest Score Award: $500 + Certificate
- 🌟 Innovation Award: $500 + Certificate

Competition Timeline

Competition starts	February 20, 2026
Results Submission deadline	March 25, 2026

Participate Now

Keynote Speakers (Tentative)

Organizers

Challenge Organizers

Xiangbo Gao

Texas A&M University

Paper Submission

Important Dates

Submissions Open	January 28, 2026, 08:00 AM UTC-0
Submissions Due	March 10, 2026, 11:59 PM UTC-0
Author Notification	TBD
Camera-Ready Due	TBD

Submission Guidelines

We welcome the following types of submissions: short papers (2–4 pages), and full papers (5–8 pages). All submissions should follow the CVPR 2026 author guidelines.

Submit via OpenReview

Paper Tracks

Full Papers

Length: Up to 8 pages (excluding references)
Content: No appendix or supplementary material in the main PDF
Proceedings: Included in official CVPR proceedings
Scope: Full research contributions

Short Papers

Length: Up to 4 pages (excluding references)
Content: No appendix or supplementary material in the main PDF
Proceedings: Not included in official proceedings (Archived on website)
Scope: Work-in-progress and preliminary results

Workshop Schedule

One-Day

Schedule TBD

The First Workshop on

Video Generative Models: Benchmarks and Evaluation

CVPR 2026

About the Workshop

Topics of Interest

Novel Metrics and Evaluation Methods

Datasets and Benchmarks

Developing video generative applications in vertical domains

VGBE 2026 Challenges

Image-to-Video Consistent Generation

Competition Timeline

Text-conditioned General Video Editing

Competition Timeline

Environment-aware Video Instance Removal

Competition Timeline

Keynote Speakers (Tentative)

Organizers

Challenge Organizers

Paper Submission

Important Dates

Submission Guidelines

Paper Tracks

Full Papers

Short Papers

Workshop Schedule