Task-Specific → Unified
From narrow task pipelines toward foundation models that understand, reason, and generate in a unified manner.
Call for Papers · International Journal of Computer Vision
An IJCV Special Issue on multimodal large language models that MUCG — jointly perceive, reason, and generate within a single cohesive architecture, closing the loop between understanding and synthesis.
Multimodal large language models now process images, video, audio, 3D data, and embodied sensory inputs. Crucially, these systems are evolving beyond the traditional silos of perception (classification, detection, grounding) and synthesis (image and video generation), and increasingly aim to unify comprehension and generation within a single, cohesive architecture.
We define this emerging paradigm as MUCG — Multimodal Unified Comprehension and Generation. Its defining characteristic is not modality or task coverage alone, but the alignment among perceptual grounding, internal reasoning, and generative outputs — heterogeneous inputs in, heterogeneous outputs out, evaluated in a closed loop. This Special Issue consolidates the conceptual, methodological, and evaluative foundations of this shift.
From narrow task pipelines toward foundation models that understand, reason, and generate in a unified manner.
Models deployed in embodied, interactive, and creative settings — comprehension guides generation; generation reveals comprehension.
New evaluation of understanding–generation consistency, grounded controllability, robustness, and reliability across tasks.
We seek principled frameworks, scalable learning paradigms, and reliable evaluation methodologies for unified multimodal systems. Topics of interest include, but are not limited to:
Prepare manuscripts according to the IJCV Submission Guidelines and select the article type “SI: Multimodal Unified Comprehension and Generation.” All papers are peer-reviewed following IJCV procedures by at least three independent reviewers.
Manuscripts must not be published or under review elsewhere. Submissions should demonstrate, in a cover letter, their relationship to the topic of this Special Issue. Papers receiving a Major Revision should be resubmitted within 3 months; Minor Revision within 1 month, with a detailed response to reviewers.
Time remaining until submission deadline
The submission deadline has passed.
All deadlines are Anywhere on Earth (AOE, UTC−12).
University of Oxford
Homepage ↗University of Oxford
Homepage ↗Stanford University
Homepage ↗National University of Singapore
Homepage ↗Nanyang Technological University
Homepage ↗Monash University
Homepage ↗Apple
Homepage ↗UW–Madison & Adobe Research
Homepage ↗UNC Chapel Hill
Homepage ↗UC Merced
Homepage ↗Submit through the IJCV Editorial Manager and select the article type “SI: Multimodal Unified Comprehension and Generation.” Follow the standard IJCV author guidelines.
If your work advances unified multimodal architectures, training and alignment, reasoning and controllability, evaluation, efficiency, or applications — see the Aims & Scope above — it is likely in scope. When in doubt, contact the guest editors.
Yes. Conference-based extended papers are expected to have a minimum of 30% additional scientific contribution — for example, new or improved algorithms or analysis, new experiments, or qualitative/quantitative comparisons. As long as it is noted in the submission that the paper is extended from a conference paper, these should be fine.
No. You are welcome to submit your manuscript at any time before the deadline. Once your submission is received, we will initiate the review process as soon as possible.
Papers with a Major Revision decision should be resubmitted within 3 months; Minor Revision within 1 month. Revised submissions must include a detailed response to reviewers.