MUCG @ ACM 2025


The 1st International Workshop on MLLM for Unified Comprehension and Generation


Dublin, Ireland

October 27-October 31, 2025

Workshop Introduction

As Multimodal Large Language Models (MLLMs) continue to advance, there is a growing need to bridge the gap between their comprehension and generation capabilities within unified frameworks. This workshop MLLM for Unified Comprehension and Generation (MUCG) aims to explore and address the fundamental challenges in developing truly integrated MLLMs that can seamlessly understand and create multimodal content. We focus on three interconnected areas:

  • Sophisticated multimodal comprehension, targeting robust understanding of complex visual content and semantic relationships.
  • Controllable content generation, addressing challenges in high-fidelity synthesis and cross-modal consistency.
  • Unified frameworks that enable semantic alignment between understanding and generation tasks.

Unlike previous approaches that treat these capabilities separately, our workshop specifically targets their integration through MLLMs, fostering focused discussions on shared architectures, bidirectional knowledge transfer, and end-to-end training strategies.

Topics

MLLM for Multimodal Comprehension & Reasoning

  • Single/Multiple Image Understanding
  • Short/Long Video Understanding
  • 3D Scene/Object Comprehension
  • Visual Document Understanding
  • Multi-view Scene Analysis
  • Complex Visual Reasoning
  • Temporal-Spatial Understanding
  • Cross-modal Knowledge Extraction
  • Visual Relationship Detection
  • Visual Question Answering
  • Visual Information Retrieval
  • Scene Graph Understanding
  • Cross-modal/Interleaved Reasoning
  • Multimodal Chain-of-Thought Reasoning

MLLM for Multimodal Content Generation

  • Text-to-Image/Video Synthesis
  • 3D Content Generation
  • Motion Sequence Generation
  • Visual Story Generation
  • Multi-image Coherent Generation
  • Auto-regressive Visual Generation
  • Layout-to-Image Generation
  • Cross-modal Style Transfer
  • Visual Content Editing
  • Multimodal Dialogue Generation
  • Sequential Image Generation
  • Conditioned Visual Generation

Unified MLLM Understanding and Generation

  • Unified Encoder-Decoder Frameworks
  • Joint Vision-Language Models
  • Agentic System
  • Autoregressive/Transfusion LLMs
  • Multi-task Learning Strategies
  • Cross-task Knowledge Transfer
  • Shared Representation Learning
  • Vision-Language Alignment
  • Multi-modal Fusion Methods
  • End-to-end Training Approaches
  • Instruction Tuning for Unification
  • Unified Tokenization Strategies
  • Bidirectional Generation Methods

Schedule

Topic Duration Speaker Organization
TBD TBD TBD TBD

Activities

We plan to hold a hybrid format of workshop, i.e., both onsite and online. For the onsite type, at least three organizers will attend in person to host the workshop. The workshop will include two major activities, the invited keynotes, and the paper presentations. We will invite keynote presentations, followed by accepted workshop presentations.

Paper Submission and Reviewing

Submission Types

  • Technical Papers

    Original research contributions that present novel ideas, methods, or results in the area of multimodal large language models for unified comprehension and generation. These papers should not exceed 8 pages (plus unlimited pages for references).

  • Perspective Papers

    Position papers that discuss new perspectives, challenges, or future directions in the field. These should also be up to 8 pages in length.

  • Demonstration Papers

    Descriptions of working systems, tools, or demonstrations that showcase innovative applications of MLLMs. These papers should be up to 4 pages long.

  • Extended Abstracts

    Non-archival extended abstracts of previously published work or work in progress. These should be up to 2 pages in length.

Important Guidelines

  • Formatting

    Templates are available from the ACM Website for both LaTeX and Word.

  • Double-Blind Review

    Submissions must be anonymized to ensure a fair review process. Authors should not identify themselves in the paper.

  • Supplementary Material

    Authors may submit supplementary material (e.g., code, data, videos) up to 100MB. This should be referenced in the paper but not included in the page limit.

We will select from the accepted papers the Best Paper Award, which will be announced during the workshop.

Reviewing Process

All submissions will undergo a rigorous double-blind peer review process. Each paper will be evaluated by at least three members of the program committee based on:

  • Relevance to MLLM for unified comprehension and generation
  • Originality and significance of contributions
  • Technical quality and depth
  • Clarity of presentation
  • Potential impact on the field

Important Dates

  • Paper Submission Start: 08 April, 2025 (AoE)
  • Paper Submission Deadline: 14 July, 2025 (AOE)
  • Notification of Acceptance: 28 July, 2025 (AoE)
  • Camera-ready Submission: August 23, 2025 (AoE) 
  • Workshop dates: 27-28 October, 2025 (AoE)

Organization Team

Jiayi Ji

National University of Singapore

Hao Fei

National University of Singapore

Gen Luo

Shanghai Artificial Intelligence Laboratory

Yaoting Wang

University of Edinburgh

Liang Zheng

Australian National University

Chia-Wen Lin

National Tsing Hua University

Shuicheng Yan

National University of Singapore

Rongrong Ji

Xiamen University

Tat-Seng Chua

National University of Singapore