Research Highlights
Online Map Vectorization for Autonomous Driving: A Rasterization Perspective
TL;DR Rasterization can offer complementary benefits to map vectorization. Motivated by this, we propose (i) MapVR, a precise map vectorization framework bridging map rasterization and map vectorization, and (ii) a highly sensitive rasterization-based metric for map vectorization.
Gongjie Zhang , Jiahao Lin, Shuang Wu, Yilin Song, Zhipeng Luo, Yang Xue, Shijian Lu, and Zuoguan Wang
The Thirty-seventh Annual Conference on Neural Information Processing Systems (NeurIPS), 2023
Towards Efficient Use of Multi-Scale Features in Transformer-Based Object Detectors
TL;DR This paper presents IMFA (Iterative Multi-scale Feature Aggregation) – the first generic paradigm to efficiently leverage multi-scale features in Transformer-based object detectors (e.g., DETR, Anchor-DETR, etc.). For efficiency, IMFA only samples multi-scale features from a few crucial keypoints within a few promising regions. We demonstrate on multiple detectors that even such extremely sparse multi-scale features are still highly beneficial to detection accuracy at small computational overheads.
Gongjie Zhang , Zhipeng Luo, Zichen Tian, Jingyi Zhang, Xiaoqin Zhang, and Shijian Lu
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023
Meta-DETR: Image-Level Few-Shot Detection with Inter-Class Correlation Exploitation
TL;DR Meta-DETR is a state-of-the-art few-shot object detector that performs image-level meta-learning-based prediction and effectively exploits the inter-class correlation to enhance generalization from old knowledge to new classes. Meta-DETR entirely bypasses the proposal quality gap between base and novel classes, thus achieving superior performance than R-CNN-based few-shot object detectors. In addition, Meta-DETR performs meta-learning on a set of support classes at one go, thus effectively leveraging the inter-class correlation for better generalization.
Gongjie Zhang , Zhipeng Luo, Kaiwen Cui, Shijian Lu, and Eric P. Xing
IEEE Transactions on Pattern Analysis and Machine Intelligence (T-PAMI), 2022NOTE: Meta-DETR first appeared as a tech report on arXiv.org [1st version] [2nd version]. Since its release, we have made substantial improvements to the original versions. We recommend reading the final published version accepted by IEEE Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) in 2022.
Semantic-Aligned Matching for Enhanced DETR Convergence and Multi-Scale Feature Fusion
TL;DR This paper presents SAM-DETR++, which is an extension of SAM-DETR. It further extends the semantics-aligned matching mechanism to fuse multi-scale features that are inherently unaligned in semantics, and achieves even faster convergence and superior detection accuracy.
Gongjie Zhang , Zhipeng Luo, Jiaxing Huang, Shijian Lu, and Eric P. Xing
International Journal of Computer Vision (IJCV), 2024
Accelerating DETR Convergence via Semantic-Aligned Matching
TL;DR This paper presents SAM-DETR – an efficient DETR-like object detector that can converge within 12 epochs and outperform the strong Faster R-CNN (w/ FPN) baseline on the COCO benchmark. This paper proposes the semantic-aligned matching mechanism to accelerate DETR’s training convergence.
Gongjie Zhang , Zhipeng Luo, Yingchen Yu, Kaiwen Cui, and Shijian Lu
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022
Defect-GAN: High-Fidelity Defect Synthesis for Automated Defect Inspection
TL;DR Defect samples are usually rare and expensive to label. This paper presents Defect-GAN to perform high-fidelity defect synthesis with many normal samples and a limited number of defect samples. We show that synthesized defect samples can be effectively leveraged to boost inspection accuracy. The technique is patent-protected and is being used in real scenarios.
Gongjie Zhang , Kaiwen Cui, Tzu-Yi Hung, and Shijian Lu
IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 2021
PNPDet: Efficient Few-Shot Detection Without Forgetting via Plug-and-Play Sub-Networks
TL;DR This paper presents a fine-tuning-based few-shot object detector that learns new concepts without forgetting learned visual concepts via weight normalized sub-networks.
Gongjie Zhang , Kaiwen Cui, Rongliang Wu, Shijian Lu, and Yonghong Tian
IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 2021
Cascade EF-GAN: Progressive Facial Expression Editing with Local Focuses
TL;DR We achieve realistic and vivid facial expression editing by designing separate branches focusing on certain areas (e.g., eyes, noses, etc.), and perform the expression transformation in an iterative manner.
Rongliang Wu, Gongjie Zhang , Shijian Lu, and Tao Chen
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (Oral), 2020
CAD-Net: A Context-Aware Detection Network for Objects in Remote Sensing Imagery
TL;DR One of the earliest works to apply Faster R-CNN to rotated object detection in remote sensing images. The proposed CAD-Net effectively uses multi-level contextual information for robust object detection for satellite imagery.
Gongjie Zhang , Shijian Lu, and Wei Zhang
IEEE Transactions on Geoscience and Remote Sensing (T-GRS), vol.57, no.12, 2019