My previous email GongjieZhang@ntu.edu.sg has expired.

Biography

Dr. Gongjie Zhang is an AI research scientist at Tongyi Lab, Alibaba Group. He holds a Ph.D. from Nanyang Technological University (NTU), Singapore (advised by Prof. Shijian Lu) and a B.Eng. from Northeastern University, China.

His ultimate goal is to build generalist Agentic AI — powered by Multimodal Large Language Models (MLLMs) — that can autonomously perceive, reason about, and interact with both the physical and virtual worlds:

  • Physical Environments — Embodied AI, 3D vision, MLLMs for spatial intelligence, Vision-Language-Action (VLA) models, robotic manipulation, and autonomous driving.
  • Virtual Environments — Training agentic MLLMs (SFT, RL, Visual CoT, Tool Use, MCP) to deliver product-ready GUI Agents for mobile, desktop, and web browsers.

He is currently focused on building product-ready Mobile GUI Agent that bring intelligent automation to everyday device interactions.

Gongjie playing with robot

Contact

Email gjz [at] ieee [dot] org

Disclaimer: The views expressed on this website are solely my own and do not reflect the views of my employer or any other organization with which I am affiliated. All information provided here does not constitute professional advice.

Last Update: March 2026

Xingyu Miao, Haoran Duan, Quanhao Qian, Jiuniu Wang, Yang Long, Ling Shao, Deli Zhao, Ran Xu, and Gongjie Zhang
IEEE/CVF International Conference on Computer Vision (ICCV), 2025 (Highlight)
TowardsSSI pipeline

TL;DR We propose a scalable 2D-to-3D lifting pipeline that converts single-view images into scale-calibrated 3D (depth, camera poses, point clouds, pseudo-RGBD), releasing COCO-3D and Objects365-v2-3D datasets. Pretraining on these data boosts 3D perception, enables zero-shot 3D segmentation, and strengthens 3D vision-language/MLLM spatial reasoning — greatly reducing reliance on costly sensor-collected 3D data.

Online Map Vectorization for Autonomous Driving: A Rasterization Perspective
Gongjie Zhang, Jiahao Lin, Shuang Wu, Yilin Song, Zhipeng Luo, Yang Xue, Shijian Lu, and Zuoguan Wang
NeurIPS, 2023
MapVR1 MapVR2

TL;DR Rasterization can offer complementary benefits to map vectorization. We propose (i) MapVR, a precise map vectorization framework bridging rasterization and vectorization, and (ii) a highly sensitive rasterization-based metric for map vectorization.

Towards Efficient Use of Multi-Scale Features in Transformer-Based Object Detectors
Gongjie Zhang, Zhipeng Luo, Zichen Tian, Jingyi Zhang, Xiaoqin Zhang, and Shijian Lu
CVPR, 2023
IMFA

TL;DR IMFA (Iterative Multi-scale Feature Aggregation) is the first generic paradigm to efficiently leverage multi-scale features in Transformer-based detectors. It only samples multi-scale features from a few crucial keypoints within a few promising regions, achieving high accuracy at small computational overheads.

Meta-DETR: Image-Level Few-Shot Detection with Inter-Class Correlation Exploitation
Gongjie Zhang, Zhipeng Luo, Kaiwen Cui, Shijian Lu, and Eric P. Xing
IEEE T-PAMI, 2022
MetaDETR1 MetaDETR2

TL;DR Meta-DETR performs image-level meta-learning-based prediction and exploits inter-class correlation to enhance generalization to new classes, entirely bypassing the proposal quality gap between base and novel classes.

Semantic-Aligned Matching for Enhanced DETR Convergence and Multi-Scale Feature Fusion
Gongjie Zhang, Zhipeng Luo, Jiaxing Huang, Shijian Lu, and Eric P. Xing
IJCV, 2024
SAM-DETR++

TL;DR SAM-DETR++ extends the semantics-aligned matching mechanism to fuse multi-scale features that are inherently unaligned in semantics, achieving even faster convergence and superior detection accuracy.

Accelerating DETR Convergence via Semantic-Aligned Matching
Gongjie Zhang, Zhipeng Luo, Yingchen Yu, Kaiwen Cui, and Shijian Lu
CVPR, 2022
SAM-DETR

TL;DR SAM-DETR converges within 12 epochs and outperforms the strong Faster R-CNN (w/ FPN) baseline on COCO, using the proposed semantic-aligned matching mechanism.

Defect-GAN: High-Fidelity Defect Synthesis for Automated Defect Inspection
Gongjie Zhang, Kaiwen Cui, Tzu-Yi Hung, and Shijian Lu
WACV, 2021
Defect-GAN Defect-GAN2

TL;DR Defect-GAN performs high-fidelity defect synthesis with many normal samples and a limited number of defect samples. The technique is patent-protected and is being used in real scenarios.

PNPDet: Efficient Few-Shot Detection Without Forgetting via Plug-and-Play Sub-Networks
Gongjie Zhang, Kaiwen Cui, Rongliang Wu, Shijian Lu, and Yonghong Tian
WACV, 2021
PNPDet

TL;DR A fine-tuning-based few-shot object detector that learns new concepts without forgetting via weight-normalized sub-networks.

Cascade EF-GAN: Progressive Facial Expression Editing with Local Focuses
Rongliang Wu, Gongjie Zhang, Shijian Lu, and Tao Chen
CVPR (Oral), 2020
CascadeEF-GAN

TL;DR Realistic facial expression editing via separate branches focusing on certain areas (eyes, noses, etc.), performed iteratively.

CAD-Net: A Context-Aware Detection Network for Objects in Remote Sensing Imagery
Gongjie Zhang, Shijian Lu, and Wei Zhang
IEEE T-GRS, vol.57, no.12, 2019
CAD-Net

TL;DR One of the earliest works to apply Faster R-CNN to rotated object detection in remote sensing images, using multi-level contextual information.

2026
Gongjie Zhang, et al. "On the Generalization Capacities of MLLMs for Spatial Intelligence." ICLR 2026 (Oral).
Quanhao Qian, Guang Zhao, Gongjie Zhang, Jiuniu Wang, Jiaxin Gao, Deli Zhao, and Ran Xu. "GP3: A 3D Geometry-Aware Policy with Multi-View Images for Robotic Manipulation." ICRA 2026.
2025
Jiuniu Wang, Gongjie Zhang, Quanhao Qian, Jiaxin Gao, Deli Zhao, and Ran Xu. "RoboSVG: A Unified Framework for Interactive SVG Generation with Multi-modal Guidance." arXiv preprint, 2025.
Xingyu Miao, Haoran Duan, Quanhao Qian, Jiuniu Wang, Yang Long, Ling Shao, Deli Zhao, Ran Xu, and Gongjie Zhang. "Towards Scalable Spatial Intelligence via 2D-to-3D Data Lifting." ICCV 2025 (Highlight). paper code data
2024
Gongjie Zhang, Zhipeng Luo, Jiaxing Huang, Shijian Lu, and Eric P. Xing. "Semantic-Aligned Matching for Enhanced DETR Convergence and Multi-Scale Feature Fusion." IJCV, vol. 132, pp. 2825–2844, 2024. paper code
Zhipeng Luo, Gongjie Zhang, Changqing Zhou, Zhonghua Wu, Qingyi Tao, Lewei Lu, and Shijian Lu. "Modeling Continuous Motion for 3D Point Cloud Object Tracking." AAAI 2024. paper
Qiu Han, Gongjie Zhang, Jiaxing Huang, Peng Gao, Zhang Wei, and Shijian Lu. "Efficient MAE Towards Large-Scale Vision Transformers." WACV 2024. paper
Zhipeng Luo, Changqing Zhou, Liang Pan, Gongjie Zhang, Tianrui Liu, Yueru Luo, Haiyu Zhao, Ziwei Liu, and Shijian Lu. "Exploring Point-BEV Fusion for 3D Point Cloud Object Tracking with Transformer." IEEE T-PAMI, vol.46, no.9, pp.5921–5935, 2024. paper code
Jiahao Nie, Yun Xing, Gongjie Zhang, Pei Yan, Aoran Xiao, Yap-Peng Tan, Alex C Kot, and Shijian Lu. "Cross-Domain Few-Shot Segmentation via Iterative Support-Query Correspondence Mining." CVPR 2024. paper
2023
Gongjie Zhang, Jiahao Lin, Shuang Wu, Yilin Song, Zhipeng Luo, Yang Xue, Shijian Lu, and Zuoguan Wang. "Online Map Vectorization for Autonomous Driving: A Rasterization Perspective." NeurIPS 2023. paper code
Gongjie Zhang, Zhipeng Luo, Zichen Tian, Jingyi Zhang, Xiaoqin Zhang, and Shijian Lu. "Towards Efficient Use of Multi-Scale Features in Transformer-Based Object Detectors." CVPR 2023. paper code
Jingyi Zhang, Jiaxing Huang, Zhipeng Luo, Gongjie Zhang, Xiaoqin Zhang, and Shijian Lu. "DA-DETR: Domain Adaptive Detection Transformer with Information Fusion." CVPR 2023. paper
Zhipeng Luo, Gongjie Zhang, Changqing Zhou, Tianrui Liu, Shijian Lu, and Liang Pan. "TransPillars: Coarse-to-Fine Aggregation for Multi-Frame 3D Object Detection." WACV 2023. paper
2022
Gongjie Zhang, Zhipeng Luo, Kaiwen Cui, Shijian Lu, and Eric P. Xing. "Meta-DETR: Image-Level Few-Shot Detection with Inter-Class Correlation Exploitation." IEEE T-PAMI, 2022. paper code
Gongjie Zhang, Zhipeng Luo, Yingchen Yu, Kaiwen Cui, and Shijian Lu. "Accelerating DETR Convergence via Semantic-Aligned Matching." CVPR 2022. paper code
Kaiwen Cui, Jiaxing Huang, Zhipeng Luo, Gongjie Zhang, Fangneng Zhan, and Shijian Lu. "GenCo: Generative Co-training for Generative Adversarial Networks with Limited Data." AAAI 2022. paper code
2021
Zhipeng Luo, Zhongang Cai, Changqing Zhou, Gongjie Zhang, Haiyu Zhao, Shuai Yi, Shijian Lu, Hongsheng Li, Shanghang Zhang, and Ziwei Liu. "Unsupervised Domain Adaptive 3D Detection with Multi-Level Consistency." ICCV 2021. paper code
Fangneng Zhan, Yingchen Yu, Kaiwen Cui, Gongjie Zhang, Shijian Lu, Jianxiong Pan, Changgong Zhang, Feiying Ma, Xuansong Xie, and Chunyan Miao. "Unbalanced Feature Transport for Exemplar-Based Image Translation." CVPR 2021. paper code
Gongjie Zhang, Kaiwen Cui, Tzu-Yi Hung, and Shijian Lu. "Defect-GAN: High-Fidelity Defect Synthesis for Automated Defect Inspection." WACV 2021. paper
Gongjie Zhang, Kaiwen Cui, Rongliang Wu, Shijian Lu, and Yonghong Tian. "PNPDet: Efficient Few-Shot Detection Without Forgetting via Plug-and-Play Sub-Networks." WACV 2021. paper
2020
Rongliang Wu, Gongjie Zhang, Shijian Lu, and Tao Chen. "Cascade EF-GAN: Progressive Facial Expression Editing with Local Focuses." CVPR 2020. paper
2019
Gongjie Zhang, Shijian Lu, and Wei Zhang. "CAD-Net: A Context-Aware Detection Network for Objects in Remote Sensing Imagery." IEEE T-GRS, vol. 57, no. 12, 2019. paper code

Defect Sample Synthesis Method, Training Method of Defect Inspection Network, Computer-Readable Medium, Computing System, and Image Inspection Apparatus

缺陷樣本合成方法、缺陷檢查網路之訓練方法、電腦可讀取媒體、計算系統以及圖像檢查裝置

Inventors: Gongjie Zhang, Kaiwen Cui, Shijian Lu and Tzuyi Hung.

Proprietor: Delta Electronics Int'l (Singapore)

Singapore Patent: 10202114457P (Issued August 2022)

China Patent: CN114693595A (Issued July 2022)

Taiwan Patent: 202228069 (Issued July 2022)

System and Method for Map Vectorization in Advanced Driving Assistance System

Inventors: Gongjie Zhang, et al.

Proprietor: Black Sesame Technologies

U.S. Patent No.: 20240404074

System and Method for Embedding Uncertainty Estimation into Deep-Neural-Network-Based Autonomous Driving Perception Frameworks

Inventors: Gongjie Zhang, et al.

Proprietor: Black Sesame Technologies

U.S. Patent No.: 20250282376

Method for Generating 3D Scene Data from 2D Images for Spatial Intelligence

一种用于空间智能的 2D 到 3D 场景数据生成方法

Inventors: Gongjie Zhang, et al.

Proprietor: Alibaba Group

China Patent under review

A Spatially-Aware Multimodal Large Language Model Based on Camera Parameters for Cross-Camera Generalization, and Its Training Method

一种基于相机参数实现跨相机泛化的空间感知多模态大语言模型及其训练方法

Inventors: Gongjie Zhang, et al.

Proprietor: Alibaba Group

China Patent under review

  • Jun. 2018 — Outstanding Graduation Project of the School of CSE, Northeastern University
  • Nov. 2017 — National Scholarship of China
  • Dec. 2016 — HUAWEI Scholarship
  • Nov. 2015 — National Scholarship of China
  • 2014–2018 — 4-year First-Class Scholarship of Northeastern University
  • 2014–2018 — Many awards for undergrad contests, including two national awards in NUEDC (National Undergrad Electronic Design Contest)

Conference Reviewer

  • CVPR 2021, 2023, 2024, 2025, 2026
  • ICCV 2021, 2023, 2025
  • ECCV 2024, 2026
  • NeurIPS 2025
  • ACM MM 2022, 2025, 2026
  • WACV 2021, 2024, 2025, 2026
  • BMVC 2022
  • ICPR 2024
  • CAI 2024

Conference Program Committee Member

  • AAAI 2024, 2025, 2026
  • CVPR 2024 Workshop on Neural Rendering Intelligence

Journal Reviewer

  • IEEE Transactions on Pattern Analysis and Machine Intelligence (T-PAMI)
  • International Journal of Computer Vision (IJCV)
  • IEEE Transactions on Image Processing (T-IP)
  • IEEE Transactions on Neural Networks and Learning Systems (T-NNLS)
  • IEEE Transactions on Circuits and Systems for Video Technology (T-CSVT)
  • IEEE Robotics and Automation Letters (RA-L)
  • Neurocomputing

Teaching Assistant

  • NTU-18-fall   CE1008   Engineering Mathematics