MUCG @ ACM MM 2025


The 1st International Workshop on MLLM for Unified Comprehension and Generation


Dublin, Ireland

October 27-October 31, 2025

Workshop Introduction

As Multimodal Large Language Models (MLLMs) continue to advance, there is a growing need to bridge the gap between their comprehension and generation capabilities within unified frameworks. This workshop MLLM for Unified Comprehension and Generation (MUCG) aims to explore and address the fundamental challenges in developing truly integrated MLLMs that can seamlessly understand and create multimodal content. We focus on three interconnected areas:

  • Sophisticated multimodal comprehension, targeting robust understanding of complex visual content and semantic relationships.
  • Controllable content generation, addressing challenges in high-fidelity synthesis and cross-modal consistency.
  • Unified frameworks that enable semantic alignment between understanding and generation tasks.

Unlike previous approaches that treat these capabilities separately, our workshop specifically targets their integration through MLLMs, fostering focused discussions on shared architectures, bidirectional knowledge transfer, and end-to-end training strategies.

WeChat Group

Scan to join our WeChat group

WeChat Group

Scan to join our WeChat group

Call4Paper Topics

We welcome paper submissions on all topics related to unified MLLMs, including but not limited to:

MLLM for Multimodal Comprehension & Reasoning

  • Single/Multiple Image Understanding
  • Short/Long Video Understanding
  • 3D Scene/Object Comprehension
  • Visual Document Understanding
  • Multi-view Scene Analysis
  • Complex Visual Reasoning
  • Temporal-Spatial Understanding
  • Cross-modal Knowledge Extraction
  • Visual Relationship Detection
  • Visual Question Answering
  • Visual Information Retrieval
  • Scene Graph Understanding
  • Cross-modal/Interleaved Reasoning
  • Multimodal Chain-of-Thought Reasoning

MLLM for Multimodal Content Generation

  • Text-to-Image/Video Synthesis
  • 3D Content Generation
  • Motion Sequence Generation
  • Visual Story Generation
  • Multi-image Coherent Generation
  • Auto-regressive Visual Generation
  • Layout-to-Image Generation
  • Cross-modal Style Transfer
  • Visual Content Editing
  • Multimodal Dialogue Generation
  • Sequential Image Generation
  • Conditioned Visual Generation

Unified MLLM Understanding and Generation

  • Unified Encoder-Decoder Frameworks
  • Joint Vision-Language Models
  • Agentic System
  • Autoregressive/Transfusion LLMs
  • Multi-task Learning Strategies
  • Cross-task Knowledge Transfer
  • Shared Representation Learning
  • Vision-Language Alignment
  • Multi-modal Fusion Methods
  • End-to-end Training Approaches
  • Instruction Tuning for Unification
  • Unified Tokenization Strategies
  • Bidirectional Generation Methods

Schedule

We plan to hold a hybrid format of workshop, i.e., both onsite and online. For the onsite type, at least three organizers will attend in person to host the workshop. The workshop will include two major activities, the invited keynotes, and the paper presentations. We will invite keynote presentations, followed by accepted workshop presentations.

The following schedules will be determined later.

Topic Duration Speaker Organization
TBD TBD TBD TBD

Keynote Speakers

The invited speakers will be determined later. Stay tuned!

Speaker-1

Institution

Speaker-2

Institution

Speaker-3

Institution

Paper Submission and Reviewing

Submission Types

  • Technical Papers

    Original research contributions that present novel ideas, methods, or results in the area of multimodal large language models for unified comprehension and generation. These papers should not exceed 8 pages (plus unlimited pages for references).

  • Perspective Papers

    Position papers that discuss new perspectives, challenges, or future directions in the field. These should also be up to 8 pages in length.

  • Demonstration Papers

    Descriptions of working systems, tools, or demonstrations that showcase innovative applications of MLLMs. These papers should be up to 4 pages long.

  • Extended Abstracts

    Non-archival extended abstracts of previously published work or work in progress. These should be up to 2 pages in length.

Important Guidelines

  • Formatting

    Templates are available from the ACM Website for both LaTeX and Word.

  • Double-Blind Review

    Submissions must be anonymized to ensure a fair review process. Authors should not identify themselves in the paper.

  • Supplementary Material

    Authors may submit supplementary material (e.g., code, data, videos) up to 100MB. This should be referenced in the paper but not included in the page limit.

We will select from the accepted papers the Best Paper Award, which will be announced during the workshop.

Reviewing Process

All submissions will undergo a rigorous double-blind peer review process. Each paper will be evaluated by at least three members of the program committee based on:

  • Relevance to MLLM for unified comprehension and generation
  • Originality and significance of contributions
  • Technical quality and depth
  • Clarity of presentation
  • Potential impact on the field

Workshop Important Dates

  • Paper Submission Start: 08 April, 2025 (AoE)
  • Paper Submission Deadline: 11 July, 2025 (AOE)
  • Notification of Acceptance: 01 August, 2025 (AoE)
  • Camera-ready Submission: 11 August, 2025 (AoE)
  • Workshop dates: 27-28 October, 2025 (AoE)

Challenge

🏆 General-Level of Multimodal Generalist

We are excited to announce the General-Level of Multimodal Generalist Challenge, hosted in conjunction with the MUCG'25 workshop. This open challenge invites researchers and practitioners to develop and evaluate MLLMs/Agents that demonstrate generalist capabilities across diverse tasks and modalities.

leaderboard-scope

The challenge is based on the General-Level platform, which provides a comprehensive evaluation suite—General-Level; and an extensive benchmark—General-Bench—for assessing the generalization abilities of MLLMs. Participants will have the opportunity to test their models on a wide range of tasks that require unified comprehension and generation across multiple modalities.

🎯 Challenge Tracks

leaderboard-scope

The challenge comprises four distinct tracks, each focusing on different aspects of multimodal generalization:

  • 👑 Scope-A: Full-spectrum Hero: Full-spectrum leaderboard covering all modalities and tasks under General-Level, for highly capable, general-purpose multimodal models.
  • 💎 Scope-B: Modality-specific Unified Hero: Modality-specific leaderboards focusing on single modality or partially joint modality (e.g., image, video, audio, 3D) for modality-wise generalists.
  • 💪 Scope-C: Comprehension/Generation Hero: Leaderboards categorized by comprehension vs. generation tasks within each modality. Lower entry barrier for early-stage or lightweight models.
  • 🛠️ Scope-D: Skill-specific Hero: Fine-grained leaderboards focused on specific task clusters (e.g., VQA, Captioning, Speech Recognition), ideal for partial generalists.

Each track is designed to evaluate specific capabilities of MLLMs, encouraging the development of models that can generalize effectively across different types of tasks and data.

📅 Challenge Important Dates

  • Challenge Registration Start: 20 May, 2025 (AoE)
  • Challenge Registration End: 1 July, 2025 (AoE)
  • Notification of Rankings: 20 July, 2025 (AoE)
  • Challenge Submission Deadline: 30 July, 2025 (AOE)
  • Paper Camera-ready Submission: 11 August, 2025 (AoE)
  • Workshop dates: 27-28 October, 2025 (AoE)

📝 Participation Guidelines

  • Registration: Interested participants should register by filling out the Google Form
  • Data and Tools: Go to the official Leaderboard site for obtaining the datasets as well as the evaluation suite.
  • Submission: Participants must submit their model predictions at Submit Page by the submission deadline for evaluation.

🏅 Awards and Recognition

Top-performing teams in each track will receive Cash awards and Certificate. Outstanding teams will be invited to write technical paper for inclusion in the MUCG'25 workshop proceedings, and present it at the workshop.

Learn more about the challenge by the Multimodal-Generalist website and the paper.

For further inquiries, please contact the challenge organizers at mugc-workshopmm25@googlegroups.com.

Organization Team

Program Committee

Jiayi Ji

National University of Singapore

Hao Fei

National University of Singapore

Gen Luo

Shanghai Artificial Intelligence Laboratory

Yaoting Wang

University of Edinburgh

Liang Zheng

Australian National University

Chia-Wen Lin

National Tsing Hua University

Shuicheng Yan

National University of Singapore

Rongrong Ji

Xiamen University

Tat-Seng Chua

National University of Singapore

Challenge Committee

Shengqiong Wu

National University of Singapore

Jinfa Huang

University of Rochester

Daoan Zhang

University of Rochester

Program Committee

TBD

We are hiring Program Committee members, filling out the Google Form for nomination, if interested.

Contact

Join and post at our Google Group!
Email the organizers at mugc-workshopmm25@googlegroups.com.