Held in conjunction with CVPR2026 (Denver), June 3 – 7, 2026
Main Theme: Low-Power Computing for Embedded Vision
Organized by: Matteo Poggi, Tse-Wei Chen, Branislav Kisacanin, Ahmed Nabil Belbachir, Marius Leordeanu
Description
Embedded vision is an active field of research, bringing together efficient learning models with fast computer vision and pattern recognition algorithms, to tackle many areas of robotics and intelligent systems that are enjoying an impressive growth today. Such strong impact comes with many challenges that stem from the difficulty of understanding complex visual scenes under the tight computational constraints required by real-time solutions on embedded devices. The Embedded Vision Workshop will provide a venue for discussing these challenges by bringing together researchers and practitioners from the different fields outlined above.
Program
Important Dates
Paper submission: March 4, 2026
Review assignment: March 7, 2026
Review deadline: March 17, 2026
Notification of acceptance: March 20, 2026
Camera-ready submission: April 7, 2026
Please refer to Submission page for details.
The accepted papers will be published in the proceeding, along with the CVPR main conference, indexed in EI Compendex.
The submission system is open! (February 6, 2026)
CMT3 Site: https://cmt3.research.microsoft.com/EVW2026
The Microsoft CMT service was used for managing the peer-reviewing process for this conference. This service was provided for free by Microsoft and they bore all expenses, including costs for Azure cloud services as well as for software development and support.
Invited Talk #1

Speaker: Prof. Enzo Tartaglione
Title: Bringing Training to the Edge: Subspace and Sparsity for Efficient Deep Learning
Abstract: Modern deep networks are increasingly difficult to train on embedded vision platforms due to strict memory, compute, and energy constraints. In this talk, we present some approaches that make on-device training practical by reducing the cost of backpropagation through low-rank subspaces and dynamic sparsity. Across CNNs and transformers, these methods drastically cut memory usage and FLOPs while preserving accuracy, enabling efficient training and inference on resource-limited devices. Together, they point toward a future where embedded vision systems can learn, adapt, and personalize directly at the edge.
Biography: Enzo Tartaglione is a Full Professor at Télécom Paris, where he is responsible for the equipe Multimedia and he is a Hi!Paris associate member. He is also a Member of the ELLIS Society, Senior IEEE Member, Associate Editor of IEEE Transactions on Neural Networks and Learning Systems and Action Editor for Transactions of Machine Learning Research. He received the MS degree in Electronic Engineering at Politecnico di Torino in 2015, cum laude. The same year, he also received a magna cum laude MS in electrical and computer engineering at University of Illinois at Chicago. In 2016 he was also awarded the MS in Electronics by Politecnico di Milano, cum laude. In 2019 he obtained a PhD in Physics at Politecnico di Torino, cum laude, with the thesis “From Statistical Physics to Algorithms in Deep Neural Systems”. His principal interests include compression and responsible (frugal) AI, privacy-aware learning, debiasing and regularization for deep learning.
Invited Talk #2

Speaker: Dr. Bowen Wen
Title: Building Foundation Models for Robotic Perception
Abstract: 3D spatial understanding is a critical skill for robotics which typically requires tedious manual design, expensive data collection and per-domain training. This presentation will focus on the development and application of foundation models to address several fundamental challenges in robotic perception. First, we introduce FoundationStereo, a novel architecture designed to maximize zero-shot performance. We discuss the creation of a large-scale (1M pairs) synthetic dataset with iterative self-curation to eliminate ambiguity. We detail how we mitigate the sim-to-real gap by integrating rich monocular priors into the stereo pipeline. Furthermore, we examine the Attentive Hybrid Cost Filtering (AHCF) module that enables long-range context reasoning. Second, we address the computational bottlenecks of foundation models with Fast-FoundationStereo. We propose a “divide-and-conquer” acceleration strategy that retains the teacher model’s robustness while achieving a 10x speedup. Finally, we demonstrate how an automatic pseudo-labeling pipeline on 1.4M in-the-wild images further closes the performance gap. The results in a new family of stereo matching models that deliver foundation-model-level accuracy with real-time inference capabilities.
Biography: Bowen is a Senior Research Scientist at NVIDIA Research, where his work focuses on large foundation models for 3D visual perception to advance embodied AI. He earned his PhD in Computer Science from Rutgers University in 2022 under the advisement of Prof. Kostas Bekris. During his PhD, he conducted several research internships at Google[X], Meta Reality Labs and Amazon Lab 126. As an impactful researcher in both computer vision and robotics, his leading projects received Best Paper Award nominations at both CVPR 2025 and RSS 2022.
Invited Talk #3
Invited Talk #4
Topics
- Agentic AI at the edge
- Embodied AI
- Lightweight and efficient computer vision algorithms for embedded systems
- Hardware dedicated to embedded vision systems (GPUs, FPGAs, DSPs, etc.)
- Software platforms for embedded vision systems
- Neuromorphic computing
- Applications of embedded vision systems in general domains: UAVs (industrial, mobile and consumer), Advanced assistance systems and autonomous navigation frameworks, Augmented and Virtual Reality, Robotics.
- New trends and challenges in embedded visual processing
- Analysis of vision problems specific to embedded systems
- Analysis of embedded systems issues specific to computer vision
- Biologically-inspired vision and embedded systems
- Hardware and software enhancements that impact vision applications
- Performance metrics for evaluating embedded systems
- Hybrid embedded systems combining vision and other sensor modalities
- Embedded vision systems applied to new domains
Committee

Program Chair:
Matteo Poggi, University of Bologna

Program Chair:
Branislav Kisacanin, Institute for AI R&D (Serbia)
Faculty of Technical Sciences, U of Novi Sad (Serbia)

Publication Chair:
Tse-Wei Chen, Canon Inc. (Japan)

General Chair:
Marius Leordeanu, University Politehnica Bucharest (Romania)

General Chair:
Ahmed Nabil Belbachir, NORCE Norwegian Research Centre (Norway)
Sponsors
Steering Committee:
Marilyn Claire Wolf, University of Nebraska-Lincoln
Martin Humenberger, NAVER LABS Europe
Roland Brockers, Jet Propulsion Laboratory
Swarup Medasani, MathWorks
Stefano Mattoccia, University of Bologna
Jagadeesh Sankaran, Nvidia
Goksel Dedeoglu, Perceptonic
Margrit Gelautz, Vienna University of Technology
Branislav Kisacanin, Nvidia
Sek Chai, Latent AI
Zoran Nikolic, Nvidia
Ravi Satzoda, Nauto
Stephan Weiss, University of Klagenfurt
Program Committee:
Dragos Costea, University Politehnica of Bucharest
Assia Belbachir, NORCE Norwegian Research Centre
Fahim Hasan Khan, California Polytechnic State University, San Luis Obispo
Florin Condrea, Siemens Corporate Research
Mihai Masala, The Institute of Mathematics of the Romanian Academy (IMAR)
Dongchao Wen, IEIT SYSTEMS Co., Ltd.
Alina Marcu, The National University of Science and Technology Politehnica Bucharest
Faycal Bensaali, University of Qatar
Sam Lerouxm Ghent University
Omkar Prabhunem Purdue University
Luca Bompani, University of Bologna
Linda M. Wills, Georgia Institute of Technology
Burak Ozer, Pekosoft LLC
Cevahir Cigla, ASELSAN
Natalia Jurado, Latent AI
Branislav Kisacanin, NVIDIA
Wei Tao, Canon Innovative Solution (Beijing) Co., Ltd.