MUCG @ ECCV 2026

Welcome to the ECCV 2026 Workshop on Multimodal Large Language Models for Unified Comprehension and Generation. This workshop aims to consolidate emerging research on unified multimodal intelligence, with a focus on systems that understand, generate, and act across modalities within a coherent framework.

Recent multimodal AI has evolved from vision-language understanding toward broader multimodal foundation models spanning image, video, audio, 3D, and generation. At the same time, the community is moving from modular pipelines toward unified tokenization, hybrid autoregressive-diffusion designs, shared representations, and synergistic learning between understanding and generation.

Our goal is to bring together researchers from academia and industry to discuss architectural designs, tokenization strategies, training objectives, evaluation protocols, and practical challenges for building general-purpose multimodal systems.

Topics and Themes

We welcome technical, position, and perspective papers related to unified multimodal modeling. Topics of interest include, but are not limited to:

01

Unified Multimodal Understanding

Captioning, VQA, retrieval, grounding, segmentation, reasoning, visual document understanding, long-video understanding, and cross-modal knowledge extraction.

02

Unified Multimodal Content Generation

Text-to-image/video, image-to-image, controllable generation, sequential image generation, visual editing, and cross-modal generative modeling.

03

Unified MLLM Understanding and Generation

Architectures and objectives that jointly model comprehension and generation, including autoregressive, diffusion, flow-based, and hybrid paradigms.

04

Synergistic Learning

How understanding and generation, or different modalities and tasks, can mutually enhance each other during pre-training, instruction tuning, and post-training.

05

Benchmarking and Evaluation

Evaluation protocols and benchmarks for unified multimodal systems, including realism, controllability, generalization, reasoning, and fair comparison.

06

Broader Directions

Reinforcement learning for unified modeling, multimodal chain-of-thought, joint vision-language/audio/3D models, cross-task transfer, and efficient training.

Submission Instructions

The workshop accepts two submission tracks to encourage broader participation:

Regular Archival Submissions

Up to 14 pages

Regular archival papers may be up to 14 pages long, including figures and tables, and should use Springer LNCS formatting. Additional pages containing only cited references are permitted. Submissions must follow the ECCV 2026 template and be uploaded through OpenReview. The review process is double-blind and managed by the workshop organizers and program committee. Conflicts of interest will be handled according to the ECCV 2026 Submission Policy.

Non-Archival Submissions

At least 4 pages

Non-archival submissions are intended for work that is already published or that authors prefer not to include in the proceedings. Eligible papers include those already peer-reviewed at major CV/ML conferences or journals. Previously reviewed or published papers will not be re-reviewed; acceptance is based on topical fit and poster-board availability. Unpublished submissions in this track will undergo double-blind review, following the same review process as regular archival submissions. Authors should submit the paper or a link to the paper via the workshop's OpenReview page.

All accepted papers will be presented as posters.

Best Paper Awards will be selected based on reviewer scores and committee evaluation.

Submissions are handled through the official ECCV 2026 MUCG OpenReview site.

Submit Paper View Important Dates View Schedule

Important Dates (AoE)

Archival submission deadline~~Jul 01~~ Jul 05

Archival notification~~Jul 18~~ Jul 20

Non-archival submission deadlineJul 25

Non-archival notificationAug 7

Camera-ready deadline for all accepted papersAug 8–12

Metadata deadlineAug 20

Workshop (during ECCV 2026)Sep 8–12

Schedule (tentative)

The workshop will be hybrid, supporting both onsite and online participation. The program consists of invited keynote talks, oral and poster presentations of accepted papers, and a closing panel on future directions in unified multimodal intelligence.

Time	Schedule	Speaker
08:50 – 09:00	Introduction and Opening Remarks	Organizers
09:00 – 09:30	Keynote Talk 1	TBD
09:30 – 10:00	Keynote Talk 2	TBD
10:00 – 10:40	Oral Presentations (Session 1)	Selected authors
10:40 – 11:00	Coffee Break
11:00 – 11:30	Keynote Talk 3	TBD
11:30 – 12:00	Keynote Talk 4	TBD
12:00 – 12:30	Poster Session 1 (Interactive) + Virtual Gallery	Accepted authors
12:30 – 13:30	Lunch Break
13:30 – 14:00	Keynote Talk 5	TBD
14:00 – 14:30	Keynote Talk 6	TBD
14:30 – 15:20	Poster Session 2 (Interactive) + Virtual Gallery	Accepted authors
15:20 – 15:50	Coffee Break
15:50 – 16:20	Keynote Talk 7	TBD
16:20 – 17:20	Panel Discussion	TBD
17:20 – 17:30	Closing Remarks + Best Paper Award	Organizers

Invited Speakers

Mike Z. ShouNational University of Singapore

Mohit BansalUniversity of North Carolina Chapel Hill

Chen Change LoyNanyang Technological University

Rogerio Schmidt FerisMIT-IBM Watson AI Lab

Jaemin ChoAllen Institute for AI / Johns Hopkins University

Yuren CongMeta

Organizing Committee

Shengqiong WuOrganizerUniversity of Oxford

Jinheng XieOrganizerNational University of Singapore

Haozhe LiuOrganizerNVIDIA Research

Tian YeOrganizerHKUST(GZ) & NVIDIA Research

Yanguang ZhaoExecutorNational University of Singapore

Enze XieOrganizerNVIDIA Research / MIT HAN Lab

Sivan DovehOrganizerStanford University

Anna KuklevaOrganizerMeta & Max Planck Institute for Informatics

Jehanzeb MirzaOrganizerXero

Mingchen ZhugeOrganizerKAUST

Hao FeiOrganizerUniversity of Oxford

Diversity and Inclusion

We are committed to promoting diversity and inclusion across the organizing committee, invited speakers, program committee, accepted papers, and audience. We will proactively encourage submissions from underrepresented groups and institutions, provide inclusive wording in the call for papers, support mentoring opportunities during poster sessions and panels, and ensure virtual access for remote or resource-constrained participants.

Contact

Questions? Please contact the workshop organizers (shengqiongwu@gmail.com).