Welcome to the ECCV 2026 Workshop on Multimodal Large Language Models for Unified Comprehension and Generation. This workshop aims to consolidate emerging research on unified multimodal intelligence, with a focus on systems that understand, generate, and act across modalities within a coherent framework.
Recent multimodal AI has evolved from vision-language understanding toward broader multimodal foundation models spanning image, video, audio, 3D, and generation. At the same time, the community is moving from modular pipelines toward unified tokenization, hybrid autoregressive-diffusion designs, shared representations, and synergistic learning between understanding and generation.
Our goal is to bring together researchers from academia and industry to discuss architectural designs, tokenization strategies, training objectives, evaluation protocols, and practical challenges for building general-purpose multimodal systems.
Topics and Themes
We welcome technical, position, and perspective papers related to unified multimodal modeling. Topics of interest include, but are not limited to:
Unified Multimodal Understanding
Captioning, VQA, retrieval, grounding, segmentation, reasoning, visual document understanding, long-video understanding, and cross-modal knowledge extraction.
Unified Multimodal Content Generation
Text-to-image/video, image-to-image, controllable generation, sequential image generation, visual editing, and cross-modal generative modeling.
Unified MLLM Understanding and Generation
Architectures and objectives that jointly model comprehension and generation, including autoregressive, diffusion, flow-based, and hybrid paradigms.
Synergistic Learning
How understanding and generation, or different modalities and tasks, can mutually enhance each other during pre-training, instruction tuning, and post-training.
Benchmarking and Evaluation
Evaluation protocols and benchmarks for unified multimodal systems, including realism, controllability, generalization, reasoning, and fair comparison.
Broader Directions
Reinforcement learning for unified modeling, multimodal chain-of-thought, joint vision-language/audio/3D models, cross-task transfer, and efficient training.
Submission Instructions
The workshop accepts two submission tracks to encourage broader participation:
Regular archival papers may be up to 14 pages long, including figures and tables, and should use Springer LNCS formatting. Additional pages containing only cited references are permitted. Submissions must follow the ECCV 2026 template and be uploaded through OpenReview. The review process is double-blind and managed by the workshop organizers and program committee. Conflicts of interest will be handled according to the ECCV 2026 Submission Policy.
Non-archival submissions are intended for work that is already published or that authors prefer not to include in the proceedings. Eligible papers include those already peer-reviewed at major CV/ML conferences or journals. Previously reviewed or published papers will not be re-reviewed; acceptance is based on topical fit and poster-board availability. Unpublished submissions in this track will undergo double-blind review, following the same review process as regular archival submissions. Authors should submit the paper or a link to the paper via the workshop's OpenReview page.
Important Dates (AoE)
Schedule (tentative)
The workshop will be hybrid, supporting both onsite and online participation. The program consists of invited keynote talks, oral and poster presentations of accepted papers, and a closing panel on future directions in unified multimodal intelligence.
| Time | Schedule | Speaker |
|---|---|---|
| 08:50 – 09:00 | Introduction and Opening Remarks | Organizers |
| 09:00 – 09:30 | Keynote Talk 1 | TBD |
| 09:30 – 10:00 | Keynote Talk 2 | TBD |
| 10:00 – 10:40 | Oral Presentations (Session 1) | Selected authors |
| 10:40 – 11:00 | Coffee Break | |
| 11:00 – 11:30 | Keynote Talk 3 | TBD |
| 11:30 – 12:00 | Keynote Talk 4 | TBD |
| 12:00 – 12:30 | Poster Session 1 (Interactive) + Virtual Gallery | Accepted authors |
| 12:30 – 13:30 | Lunch Break | |
| 13:30 – 14:00 | Keynote Talk 5 | TBD |
| 14:00 – 14:30 | Keynote Talk 6 | TBD |
| 14:30 – 15:20 | Poster Session 2 (Interactive) + Virtual Gallery | Accepted authors |
| 15:20 – 15:50 | Coffee Break | |
| 15:50 – 16:20 | Keynote Talk 7 | TBD |
| 16:20 – 17:20 | Panel Discussion | TBD |
| 17:20 – 17:30 | Closing Remarks + Best Paper Award | Organizers |
Invited Speakers
Organizing Committee
Diversity and Inclusion
We are committed to promoting diversity and inclusion across the organizing committee, invited speakers, program committee, accepted papers, and audience. We will proactively encourage submissions from underrepresented groups and institutions, provide inclusive wording in the call for papers, support mentoring opportunities during poster sessions and panels, and ensure virtual access for remote or resource-constrained participants.
Contact
Questions? Please contact the workshop organizers (shengqiongwu@gmail.com).
















