Hi, this is Xiaojie Xu(徐啸捷). I am an incoming Ph.D. student in Information Science and Technology at The University of Tokyo. My current research focuses on Generative AI, including image, video and multimodal generation. Representative works include:

  • Multimodal Generation: POSTA(visually appealing movie poster generation from text, CVPR 25), PreGenie(MLLM Agents for text-image document understanding and presentation generation, EMNLP 25), Orchestrating Audio(MLLM Agents for long-video understanding and audio generation, EMNLP 25)
  • Image/Video Generation: VBench++(benchmarking video generative models, T-PAMI 25), BEV to Street View(street-view images generation from bird’s-eye view map, ICRA 24)

Prior, I did research with Shanda AI Research Tokyo, Tencent AI Lab and NTU MMLab. Feel free to contact me for collaboration🤠.

📝 Recent Publications

* indicates equal contributions. For a complete list of publications, please refer to my Google Scholar profile.

T-PAMI 2025
sym

VBench++: Comprehensive and Versatile Benchmark Suite for Video Generative Models

Ziqi Huang*, Fan Zhang*, Xiaojie Xu, Yinan He, Jiashuo Yu, Ziyue Dong, Qianli Ma, Nattapol Chanpaisit, Chenyang Si, Yuming Jiang, Yaohui Wang, Xinyuan Chen, Ying-Cong Chen, Limin Wang, Dahua Lin, Yu Qiao, Ziwei Liu

IEEE Transactions on Pattern Analysis and Machine Intelligence(T-PAMI), Github stars > 1k

EMNLP 2025, Findings
sym

PreGenie: An Agentic Framework for High-quality Visual Presentation Generation

Xiaojie Xu, Xinli Xu, Sirui Chen, Haoyu Chen, Fan Zhang, Ying-Cong Chen

Conference on Empirical Methods in Natural Language Processing(EMNLP), Findings

EMNLP 2025, Main
sym

Long-Video Audio Synthesis with Multi-Agent Collaboration

Yehang Zhang, Xinli Xu, Xiaojie Xu, Doudou Zhang, Li Liu, Ying-Cong Chen

Conference on Empirical Methods in Natural Language Processing(EMNLP), Main

CVPR 2025
sym

POSTA: A Go-to Framework for Customized Artistic Poster Generation

Haoyu Chen*, Xiaojie Xu*, Wenbo Li, Jingjing Ren, Tian Ye, Songhua Liu, Ying-Cong Chen, Lei Zhu, Xinchao Wang

Conference on Computer Vision and Pattern Recognition(CVPR)

ICRA 2024
sym

From Bird’s-Eye to Street View: Crafting Diverse and Condition-Aligned Images with Latent Diffusion Model

Xiaojie Xu, Tianshuo Xu, Fulong Ma and Ying-Cong Chen

International Conference on Robotics and Automation(ICRA)

📖 Education