Jinheng XIE

Hi, there! I'm Jinheng, a second-year PhD student at Show Lab, National University of Singapore, working with Prof. Mike Shou. My research interests focus on multi-modal computer vision. Previously, I enthusiastically worked on label-efficient learning of scene understanding (object localization and segmentation in un/weakly-supervised manners). I'm currently exploring multi-modal pre-training and generation such as generative pre-training and text-to-image generation.
Google Scholar     Github     LinkedIn    

sierkinhane at gmail dot com / jinheng at u dot nus dot edu

2024/08

We release a unified model, i.e., Show-o, that unifies multimodal understanding and generation in one single transformer. Code and models are available here.

2024/02

One paper got accepted to CVPR 2024

2023/12

One paper got accepted by AAAI 2024

2023/09

Two papers got accepted to NeurIPS 2023

2023/07

One paper got accepted to ACM MM 2023

2023/07

One paper got accepted to ICCV 2023 and One paper got accepted to MICCAI 2023

2022/11

Served as a reviewer for ICCV 2023

2022/11

Served as a reviewer for CVPR 2023

2022/10

Served as a reviewer for IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI)

2022/09

Served as a reviewer for International Journal of Computer Vision (IJCV)

2022/09

Received China National Scholarship

2022/08

One paper got accepted to ECCV 2022 Workshop

2022/08

Ranked 4th in Out-of-Distribution Visual Recognition ECCV'2022 NICO Challenge

2022/06

One paper got accepted to MICCAI 2022

2022/03

Three papers got accepted to CVPR 2022

2021/09

Received China National Scholarship

2021/07

One paper got accepted to ICCV 2021
★ 2024 ★

Show-o: One Single Transformer to Unify Multimodal Understanding and Generation

Jinheng Xie*, Weijia Mao*, Zechen Bai*, David Junhao Zhang*, Weihao Wang, Kevin Qinghong Lin, Yuchao Gu, Zhijie Chen, Zhenheng Yang, Mike Zheng Shou
arXiv preprint arXiv:2408.12528
PDF  •   Code

Cross-Attention Makes Inference Cumbersome in Text-to-Image Diffusion Models

Wentian Zhang*, Haozhe Liu*, Jinheng Xie*, Francesco Faccio, Mike Zheng Shou, Jürgen Schmidhuber
arXiv preprint arXiv:2404.02747
PDF  •   Code

Tune-An-Ellipse: CLIP Has Potential to Find What You Want

Jinheng Xie, Songhe Deng, Bing Li, Haozhe Liu, Yawen Huang, Yefeng Zheng, Jürgen Schmidhuber, Bernard Ghanem, Linlin Shen, Mike Zheng Shou
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024

HEAP: Unsupervised Object Discovery and Localization with Contrastive Grouping

Xin Zhang, Jinheng Xie, Yuan Yuan, Michael Bi Mi, Robby T. Tan
Thirty-eighth Annual AAAI Conference on Artificial Intelligence (AAAI), 2024
PDF  

★ 2023 ★

Learning Visual Prior via Generative Pre-Training

Jinheng Xie, Kai Ye, Yudong Li, Yuexiang Li, Kevin Qinghong Lin, Yefeng Zheng, Linlin Shen, Mike Zheng Shou
Thirty-seventh Conference on Neural Information Processing Systems (NeurIPS), 2023
Webpage  •   PDF  •   Code

Dynamically Masked Discriminator for GANs

Wentian Zhang, Haozhe Liu, Bing Li, Jinheng Xie, Yawen Huang, Yuexiang Li, Yefeng Zheng, Bernard Ghanem
Thirty-seventh Conference on Neural Information Processing Systems (NeurIPS), 2023
PDF  •   Code

BoxDiff: Text-to-Image Synthesis with Training-Free Box-Constrained Diffusion

Jinheng Xie, Yuexiang Li, Yawen Huang, Haozhe Liu, Wentian Zhang, Yefeng Zheng, Mike Zheng Shou
IEEE/CVF International Conference on Computer Vision (ICCV), 2023
PDF  •   Code

★ 2022 ★

Decoupled Mixup for Out-of-Distribution Visual Recognition

Haozhe Liu*, Wentian Zhang*, Jinheng Xie*, Haoqian Wu, Bing Li, Ziqi Zhang, Yuexiang Li, Yawen Huang, Bernard Ghanem, Yefeng Zheng
European Conference on Computer Vision Workshop (ECCVW), 2022
PDF  •   Code

Point Beyond Class: A Benchmark for Weakly Semi-Supervised Abnormality Localization in Chest X-Rays

Haoqin Ji, Haozhe Liu, Yuexiang Li, Jinheng Xie, Nanjun He, Yawen Huang, Dong Wei, Xinrong Chen, Linlin Shen, Yefeng Zheng
International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI), 2022
PDF  •   Code

CLIMS: Cross Language Image Matching for Weakly Supervised Semantic Segmentation

Jinheng Xie, Xianxu Hou, Kai Ye, Linlin Shen
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022
PDF  •   Code

C2AM: Contrastive Learning of Class-agnostic Activation Map for Weakly Supervised Object Localization and Semantic Segmentation

Jinheng Xie, Jianfeng Xiang, Junliang Chen, Xianxu Hou, Xiaodong Zhao, Linlin Shen
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022
PDF  •   Code

Frequency-driven Imperceptible Adversarial Attack on Semantic Similarity

Cheng Luo, Qinliang Lin, Weicheng Xie, Bizhu Wu, Jinheng Xie, Linlin Shen
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022
PDF  •   Code

Online Refinement of Low-level Feature Based Activation Map for Weakly Supervised Object Localization

Jinheng Xie, Cheng Luo, Xiangping Zhu, Ziqi Jin, Weizeng Lu, Linlin Shen
IEEE/CVF International Conference on Computer Vision (ICCV), 2021
PDF  •   Code

2023

Show Lab Annual Award (4,000 SGD)

2023

Outstanding Graduate Award (Rate < 5%)

2022

China National Scholarship (Rate <= 0.02%)

2021

China National Scholarship (Rate <= 0.02%)

2021

Excellent Academic Scholarship, First Class
My favorite singer is "Liang Bo", and I would be delighted to recommend some of his songs to you.