MUCG @ ACM MM 2025

The 1^st International Workshop on MLLM for Unified Comprehension and Generation

Dublin, Ireland

October 27-October 31, 2025

Workshop Introduction

As Multimodal Large Language Models (MLLMs) continue to advance, there is a growing need to bridge the gap between their comprehension and generation capabilities within unified frameworks. This workshop MLLM for Unified Comprehension and Generation (MUCG) aims to explore and address the fundamental challenges in developing truly integrated MLLMs that can seamlessly understand and create multimodal content. We focus on three interconnected areas:

Sophisticated multimodal comprehension, targeting robust understanding of complex visual content and semantic relationships.
Controllable content generation, addressing challenges in high-fidelity synthesis and cross-modal consistency.
Unified frameworks that enable semantic alignment between understanding and generation tasks.

Unlike previous approaches that treat these capabilities separately, our workshop specifically targets their integration through MLLMs, fostering focused discussions on shared architectures, bidirectional knowledge transfer, and end-to-end training strategies.

Scan to join our WeChat group

Call4Paper Topics

We welcome paper submissions on all topics related to unified MLLMs, including but not limited to:

MLLM for Multimodal Comprehension & Reasoning

Single/Multiple Image Understanding
Short/Long Video Understanding
3D Scene/Object Comprehension
Visual Document Understanding
Multi-view Scene Analysis
Complex Visual Reasoning
Temporal-Spatial Understanding
Cross-modal Knowledge Extraction
Visual Relationship Detection
Visual Question Answering
Visual Information Retrieval
Scene Graph Understanding
Cross-modal/Interleaved Reasoning
Multimodal Chain-of-Thought Reasoning

MLLM for Multimodal Content Generation

Text-to-Image/Video Synthesis
3D Content Generation
Motion Sequence Generation
Visual Story Generation
Multi-image Coherent Generation
Auto-regressive Visual Generation
Layout-to-Image Generation
Cross-modal Style Transfer
Visual Content Editing
Multimodal Dialogue Generation
Sequential Image Generation
Conditioned Visual Generation

Unified MLLM Understanding and Generation

Unified Encoder-Decoder Frameworks
Joint Vision-Language Models
Agentic System
Autoregressive/Transfusion LLMs
Multi-task Learning Strategies
Cross-task Knowledge Transfer
Shared Representation Learning
Vision-Language Alignment
Multi-modal Fusion Methods
End-to-end Training Approaches
Instruction Tuning for Unification
Unified Tokenization Strategies
Bidirectional Generation Methods

Schedule

We plan to hold a hybrid format of workshop, i.e., both onsite and online. For the onsite type, at least three organizers will attend in person to host the workshop. The workshop will include two major activities, the invited keynotes, and the paper presentations. We will invite keynote presentations, followed by accepted workshop presentations.

The following schedules will be determined later.

Topic	Duration	Speaker	Organization
TBD	TBD	TBD	TBD

Keynote Speakers

The invited speakers will be determined later. Stay tuned!

Speaker-1

Institution

Speaker-2

Institution

Speaker-3

Institution

Paper Submission and Reviewing

Submission Types

Technical Papers

Original research contributions that present novel ideas, methods, or results in the area of multimodal large language models for unified comprehension and generation. These papers should not exceed 8 pages (plus unlimited pages for references).
Perspective Papers

Position papers that discuss new perspectives, challenges, or future directions in the field. These should also be up to 8 pages in length.
Demonstration Papers

Descriptions of working systems, tools, or demonstrations that showcase innovative applications of MLLMs. These papers should be up to 4 pages long.
Extended Abstracts

Non-archival extended abstracts of previously published work or work in progress. These should be up to 2 pages in length.

Important Guidelines

Formatting

Templates are available from the ACM Website for both LaTeX and Word.
Double-Blind Review

Submissions must be anonymized to ensure a fair review process. Authors should not identify themselves in the paper.
Supplementary Material

Authors may submit supplementary material (e.g., code, data, videos) up to 100MB. This should be referenced in the paper but not included in the page limit.

We will select from the accepted papers the Best Paper Award, which will be announced during the workshop.

Submit Your Paper

Reviewing Process

All submissions will undergo a rigorous double-blind peer review process. Each paper will be evaluated by at least three members of the program committee based on:

Relevance to MLLM for unified comprehension and generation
Originality and significance of contributions
Technical quality and depth
Clarity of presentation
Potential impact on the field

Workshop Important Dates

Paper Submission Start: 08 April, 2025 (AoE)
Paper Submission Deadline: 11 July, 2025 (AOE)
Notification of Acceptance: 01 August, 2025 (AoE)
Camera-ready Submission: 11 August, 2025 (AoE)
Workshop dates: 27-28 October, 2025 (AoE)

Challenge

🏆 General-Level of Multimodal Generalist

We are excited to announce the General-Level of Multimodal Generalist Challenge, hosted in conjunction with the MUCG'25 workshop. This open challenge invites researchers and practitioners to develop and evaluate MLLMs/Agents that demonstrate generalist capabilities across diverse tasks and modalities.

The challenge is based on the General-Level platform, which provides a comprehensive evaluation suite—General-Level; and an extensive benchmark—General-Bench—for assessing the generalization abilities of MLLMs. Participants will have the opportunity to test their models on a wide range of tasks that require unified comprehension and generation across multiple modalities.

🎯 Challenge Tracks

The challenge comprises four distinct tracks, each focusing on different aspects of multimodal generalization:

👑 Scope-A: Full-spectrum Hero: Full-spectrum leaderboard covering all modalities and tasks under General-Level, for highly capable, general-purpose multimodal models.
💎 Scope-B: Modality-specific Unified Hero: Modality-specific leaderboards focusing on single modality or partially joint modality (e.g., image, video, audio, 3D) for modality-wise generalists.
💪 Scope-C: Comprehension/Generation Hero: Leaderboards categorized by comprehension vs. generation tasks within each modality. Lower entry barrier for early-stage or lightweight models.
🛠️ Scope-D: Skill-specific Hero: Fine-grained leaderboards focused on specific task clusters (e.g., VQA, Captioning, Speech Recognition), ideal for partial generalists.

Each track is designed to evaluate specific capabilities of MLLMs, encouraging the development of models that can generalize effectively across different types of tasks and data.

📅 Challenge Important Dates

Challenge Registration Start: 20 May, 2025 (AoE)
Challenge Registration End: 1 July, 2025 (AoE)
Notification of Rankings: 20 July, 2025 (AoE)
Challenge Submission Deadline: 30 July, 2025 (AOE)
Paper Camera-ready Submission: 11 August, 2025 (AoE)
Workshop dates: 27-28 October, 2025 (AoE)

📝 Participation Guidelines

Registration: Interested participants should register by filling out the Google Form
Data and Tools: Go to the official Leaderboard site for obtaining the datasets as well as the evaluation suite.
Submission: Participants must submit their model predictions at Submit Page by the submission deadline for evaluation.

🏅 Awards and Recognition

Top-performing teams in each track will receive Cash awards and Certificate. Outstanding teams will be invited to write technical paper for inclusion in the MUCG'25 workshop proceedings, and present it at the workshop.

Learn more about the challenge by the Multimodal-Generalist website and the paper.

For further inquiries, please contact the challenge organizers at mugc-workshopmm25@googlegroups.com.

MUCG @ ACM MM 2025

The 1st International Workshop on MLLM for Unified Comprehension and Generation

Workshop Introduction

Call4Paper Topics

MLLM for Multimodal Comprehension & Reasoning

MLLM for Multimodal Content Generation

Unified MLLM Understanding and Generation

Schedule

Keynote Speakers

Speaker-1

Speaker-2

Speaker-3

Paper Submission and Reviewing

Submission Types

Technical Papers

Perspective Papers

Demonstration Papers

Extended Abstracts

Important Guidelines

Formatting

Double-Blind Review

Supplementary Material

Reviewing Process

Workshop Important Dates

Challenge

🏆 General-Level of Multimodal Generalist

🎯 Challenge Tracks

📅 Challenge Important Dates

📝 Participation Guidelines

🏅 Awards and Recognition

Organization Team

Program Committee

Jiayi Ji

Hao Fei

Gen Luo

Yaoting Wang

Liang Zheng

Chia-Wen Lin

Shuicheng Yan

Rongrong Ji

Tat-Seng Chua

Challenge Committee

Shengqiong Wu

Jinfa Huang

Daoan Zhang

Program Committee

TBD

We are hiring Program Committee members, filling out the Google Form for nomination, if interested.

Contact

Join and post at our Google Group!

Email the organizers at mugc-workshopmm25@googlegroups.com.

The 1^st International Workshop on MLLM for Unified Comprehension and Generation