
Ruimao Zhang
Research Assistant Professor ( Computer Vision, Deep Learning, Embodied AI )
School of Data Science , The Chinese University of Hong Kong, Shenzhen (CUHKSZ) Google Scholar , Semantic Scholar , Twitter , ZhihuNews
The primary objective of our research team is to develop intelligent agents that can effectively collaborate with humans in dynamic environments. To realize this ambition, we focus on three core research directions. (1) Human-centered Visual Content Understanding and Reasoning: This area seeks to enable machines to actively perceive, analyze, and interpret human states, behaviors, and underlying motivations in dynamic scenarios. (2) Omni-modal Scene Perception and Navigation: This emphasizes harnessing diverse sensor modalities to comprehend and navigate complex scenes. (3) Machine Behavior Planning and Decision-making: This direction is centered on equipping intelligent agents with the ability to make real-time decisions based on their comprehension of understanding surroundings.
BIOGRAPHY
“The weak and ignorance is not a barrier to survive, arrogance is"
---《The Three-Body Problem》 Cixin Liu
“No human nature, people will lose a lot; no bestiality, people will not survive“
---《The Three-Body Problem》 Cixin Liu
Education
     Postdoctoral Fellow in Multimedia Lab, worked with Prof. Xiaogang Wang ( co-founder of SenseTime ) and Prof. Ping Luo.
     Ph.D. in Computer Science and Technology, advised by Prof. Liang Lin ( IEEE/IAPR Fellow, Distinguish Young Scholar of NSFC ).
     B.E. in Software Engineering.
Experience
      Research Assistant Professor, School of Data Science.
      Research Scientist
      Senior Researcher, report to Prof. Jinwei Gu in SenseBrain, USA.
      Visiting Ph.D. Student, advised by Prof. Lei Zhang and Prof. Wangmeng Zuo.
      Research Assistant, advised by Prof. Liang Lin.
Awards and Honours
Academic Activity
      Executive Area Chair, Vision And Learning SEminar (VALSE), China
     ICCV2023, ICML2023, CVPR2023, NeurIPS2022, ECCV2022, ICLR2022, NeurIPS2021, ICCV2021, CVPR2021, ICLR2021, AAAI2021, NeurIPS2020, ICCV2019, CVPR2019, ICME2016
     T-PAMI ( IEEE Trans. on PAMI ), IJCV, T-NNLS, T-IP, T-MM, T-CSVT, T-DSC, T-IFS, Pattern Recognition, Neurocomputing
      "Computer Vision for Fashion, Art, and Design" at ICCV2019 and CVPR 2020.
PUBLICATION
Preprint
(* indicates corresponding author)
- Jie Yang, Ailing Zeng*, Ruimao Zhang*, Lei Zhang "UniPose: Detecting Any Keypoints“, arXiv preprint arXiv:2310.08530 (2023). ( The first unified framework, named UniPose, to detect keypoints of any articulated (e.g., human and animal), rigid, and soft objects via visual or textual prompts for fine-grained vision understanding and manipulation. ) 【PDF】
- Shunlin Lu, Ling-Hao Chen, Ailing Zeng, Jing Lin, Ruimao Zhang*, Lei Zhang, Heung-Yeung Shum* "HumanTOMATO: Text-aligned Whole-body Motion Generation“, arXiv preprint arXiv:2310.12978 (2023). ( A novel text-aligned whole-body motion generation framework that can generate high-quality, diverse, and coherent facial expressions, hand gestures, and body motions simultaneously. ) 【PDF】
- Jie Yang, Bingliang Li, Fengyu Yang, Ailing Zeng*, Lei Zhang, Ruimao Zhang*, "Boosting Human-Object Interaction Detection with Text-to-Image Diffusion Model“, arXiv preprint arXiv:2305.12252 (2023). ( We introduce a novel scheme, DiffHOI, which leverages both the generative and representation capacities of pre-trained text-to-image diffusion models to enhance the performance of HOI detection tasks. ) 【PDF】
- Jiong Wang, Fengyu Yang, Wenbo Gou, Bingliang Li, Danqi Yan, Ailing Zeng, Yijun Gao, Junle Wang, Ruimao Zhang*, "FreeMan: Towards Benchmarking 3D Human Pose Estimation in the wild“, arXiv preprint arXiv:2309.05073 (2023). ( We present FreeMan, the first large-scale, real-world multi-view dataset. FreeMan was captured by synchronizing 8 smartphones across diverse scenarios. It comprises 11M frames from 8000 sequences, viewed from different perspectives. These sequences cover 40 subjects across 10 different scenarios, 27 locations, each with varying lighting conditions. ) 【PDF】【Project】
Newly Accepted Articles
(* indicates corresponding author)
- Jing Lin, Ailing Zeng, Shunlin Lu, Yuanhao Cai, Ruimao Zhang, Haoqian Wang, Lei Zhang, "Motion-X: A Large-scale 3D Expressive Whole-body Human Motion Dataset“, Proc. of Conference on Neural Information Processing Systems ( NeurIPS ), Track Datasets and Benchmarks, 2023 ( A large-scale 3D expressive whole-body human motion dataset (over 10M frames) with SMPL-X, text, audio, and RGB modalities.) 【Homepage】【Code】
- Siyue Yao, Mingjie Sun, Bingliang Li, Fengyu Yang, Junle Wang, Ruimao Zhang*, "Dance with You: The Diversity Controllable Dancer Generation via Diffusion Models“, Proc. of ACM International Conference on Multimedia ( ACM MM ), 2023 ( We introduce a novel multi-dancer synthesis task called partner dancer generation, aiming to synthesize virtual human dancers capable of performing the dance with users. ) 【PDF】【Dataset】
- Jie Yang, Ailing Zeng*, Feng Li, Shilong Liu, Ruimao Zhang*, Lei Zhang, "Neural Interactive Keypoint Detection“, Proc. of IEEE International Conference on Computer Vision ( ICCV ), 2023 ( The first end-to-end neural interactive keypoint detection framework significantly reduces 10+ times labeling costs. ) 【PDF】 【Code】 【Youtube】
- Yiran Qin, Chaoqun Wang, Zijian Kang, Ningning Ma, Zhen Li, Ruimao Zhang*., "SupFusion: Supervised LiDAR-Camera Fusion for 3D Object Detection“, Proc. of IEEE International Conference on Computer Vision ( ICCV ), 2023 ( The first work discusses how to perform multimodal fusion for 3D detection by constructing the supervisory signal. ) 【PDF】 【Code】
- Jie Yang, Chaoqun Wang, Zhen Li, Junle Wang, Ruimao Zhang*, "Semantic Human Parsing via Scalable Semantic Transfer over Multiple Label Domains“, Proc. of IEEE International Conference on Computer Vision and Pattern Recognition ( CVPR ), 2023 ( We have answered the question: how could we leverage the mutual benefits of data from multiple labeling granularities to improve the performance of a given human parsing network with the specific architecture! ) 【PDF】 【Code】
- Ye Zhu, Jie Yang, Siqi Liu, Ruimao Zhang*, "Inherent Consistent Learning for Accurate Semi-supervised Medical Image Segmentation", Proc. of Conference on Medical Imaging with Deep Learning( MIDL ), 2023 ( Oral ) ( A novel plug-and-play module for effectively improving the performance of semi-supervised segmentation, especially for small organs and lesions. ) 【PDF】 【Code】
- Jie Yang, Ye Zhu, Chaoqun Wang, Zhen Li, Ruimao Zhang*, "Toward Unpaired Multi-modal Medical Image Segmentation via Learning Structured Semantic Consistency“, Proc. of Conference on Medical Imaging with Deep Learning( MIDL ), 2023 ( A novel scheme to learn the mutual benefits of different modalities to achieve better segmentation results for unpaired multi-modal medical images. ) 【PDF】
- Jie Yang, Ailing Zeng*, Shilong Liu, Feng Li, Ruimao Zhang*, Lei Zhang, "Explicit Box Detection Unifies End-to-End Multi-Person Pose Estimation“, Proc. of International Conference on Learning Representations( ICLR ), 2023 、 ( We reformulate the task of multi-person pose estimation with two explicit box detection processes and achieve a new SOTA! ) 【PDF】【Code】【Youtube】
Recent Selected Publications ( See Full List )
(* indicates corresponding author)
- Jie Yang, Ailing Zeng*, Feng Li, Shilong Liu, Ruimao Zhang*, Lei Zhang, "Neural Interactive Keypoint Detection“, Proc. of IEEE International Conference on Computer Vision ( ICCV ), 2023 【PDF】 【Code】
- Yiran Qin, Chaoqun Wang, Zijian Kang, Ningning Ma, Zhen Li, Ruimao Zhang*, "SupFusion: Supervised LiDAR-Camera Fusion for 3D Object Detection“, Proc. of IEEE International Conference on Computer Vision ( ICCV ), 2023 【PDF】 【Code】
- Jie Yang, Chaoqun Wang, Zhen Li, Junle Wang, Ruimao Zhang*, "Semantic Human Parsing via Scalable Semantic Transfer over Multiple Label Domains“, Proc. of IEEE International Conference on Computer Vision and Pattern Recognition ( CVPR ), 2023 【PDF】【Code】
- Ye Zhu, Jie Yang, Siqi Liu, Ruimao Zhang*, "Inherent Consistent Learning for Accurate Semi-supervised Medical Image Segmentation“, Proc. of Conference on Medical Imaging with Deep Learning( MIDL ), 2023 ( Oral ) 【PDF】【Code】
- Jie Yang, Ailing Zeng*, Shilong Liu, Feng Li, Ruimao Zhang*, Lei Zhang, "Explicit Box Detection Unifies End-to-End Multi-Person Pose Estimation“, Proc. of International Conference on Learning Representations( ICLR ), 2023 【PDF】【Code】【Youtube】
- Yuanfeng Ji, Haotian Bai, Jie Yang, Chongjian Ge, Ye Zhu, Ruimao Zhang*, Zhen Li*, Lingyan Zhang, Wanling Ma, Xiang Wan, Ping Luo*, "AMOS: A Large-Scale Abdominal Multi-Organ Benchmark for Versatile Medical Image Segmentation“, Proc. of Conference on Neural Information Processing Systems ( NeurIPS ), 2022 ( Oral ) 【PDF】【AMOS Challenge】
- Haotian Bai, Ruimao Zhang*, Jiong Wang, Xiang Wan, "Weakly Supervised Object Localization via Transformer with Implicit Spatial Calibration“, Proc. of Europe Conference on Computer Vision( ECCV ), 2022 【PDF】【Code】【Youtube】
- Ping Luo, Ruimao Zhang*, Jiamin Ren, Zhanglin Peng, Jingyu Li, "Switchable Normalization for Learning-to-Normalize Deep Representation", IEEE Transactions on Pattern Analysis and Machine Intelligence ( T-PAMI ), 43(2):712-728, 2021 ( IF:17.86 ) 【PDF】【Code】
- Ruimao Zhang, Zhanglin Peng, Lingyun Wu, Zhen Li, Ping Luo, " Exemplar Normalization for Learning Deep Representation ", Proc. of IEEE International Conference on Computer Vision and Pattern Recognition ( CVPR ), 2020 【PDF】【Supp】
- Ruimao Zhang, Liang Lin, Guangrun Wang, Meng Wang, Wangmeng Zuo, “Hierarchical Scene Parsing by Weakly Supervised Learning with Image Descriptions”, IEEE Transactions on Pattern Analysis and Machine Intelligence ( T-PAMI ), 41(3):596 - 610, 2019 ( IF:17.86 ) 【PDF】
- Ruimao Zhang, Jingyu Li, Hongbin Sun, Yuying Ge, Ping Luo, Xiaogang Wang, Liang Lin, “SCAN: Self-and-Collaborative Attention Network for Video Person Re-identification”, IEEE Transactions on Image Processing ( T-IP ), 28(10):4870-4882, 2019 【PDF】【Code】
MEMBER
Ph.D. Students

Chaoqun Wang
Ph.D., since 2021, co-supervised with Prof. Tianwei Yu
Scene Understanding, Video Analysis, Multimodal Learning
M.S.: Nanjing Univ. of Sci. & Tech.
B.S.: Huazhong Univ. of Sci. & Tech.

Yiran Qin
Ph.D., since 2021, co-supervised with Prof. Zhen Li
Scene Understanding, Embodied AI, Large Visual Language Model
M.S.: not applicable
B.E.: Shandong University (Top 10%)

Jie Yang
Ph.D., since 2021, co-supervised with Prof. Zhen Li
Human Centric Visual Perception and Generation
M.S.: not applicable
B.E.: Harbin Engineering Univ. (Top 1%)

Ruiying Liu
Ph.D., since 2022, co-supervised with Prof. Tianshu Yu
AI+Science, Deep Graph Representation Learning
M.S.: Xiamen University
B.E.: Xiamen University

Shunlin Lu
Ph.D., since 2023, co-supervised with Prof. Benyou Wang
Multi-modal Learning, Human Centric Understanding
M.S.: University of Southern California
B.E.: Wuhan University of Technology
MPhil Students

Bingliang Li
Master, since 2022, master student in School of Data Science
Scene Understanding, Referring Image Segmentation

Fengyu Yang
Master, since 2022, master student in School of Data Science
Human Centric 3D Perception, Synthesis and Animation
Research Assistants

Hanqi Jiang
B.E., Beijing Jiaotong Univeristy
Cross-modal Learning, Transformer, CLIP

Jiayu Chang
B.E., Central South Univeristy
Human Centric Visual Perception and Generation
Alumni
      Current Position: Ph.D. student, The Hong Kong University of Science and Technology, Guangzhou (HKUST-GZ), China.
      Current Position: Ph.D. student, Xi'an Jiaotong-Liverpool University (XJTLU), China.
      Current Position: Ph.D. student, Hong Kong Baptist University (HKBU), Hong Kong, China.
      Current Position: Ph.D. student, Sun Yat-sen University (SYSU), China.
      Current Position: Ph.D. student, The Hong Kong University of Science and Technology, Guangzhou (HKUST-GZ), China.
      Current Position: Ph.D. student, University of Illinois Urbana-Champaign (UIUC), U.S.
      Current Position: Researcher, Shanghai Artificial Intelligence Laboratory, China.
TEACHING
      Instructor, The Chinese University of Hong Kong, Shenzhen.
      Instructor, The Chinese University of Hong Kong, Shenzhen.
      Instructor, The Chinese University of Hong Kong, Shenzhen.
      Instructor, The Chinese University of Hong Kong, Shenzhen.
      Taught by Prof. Liang Lin , @Sun Yat-sen University.
      Teaching Assistant
      Taught by Prof. Weishi Zheng , @Sun Yat-sen University.
      2+2 International Undergraduate Program, all in English.
      Teaching Assistant, Sun Yat-sen University.
      Taught by Prof. Liang Lin , @Sun Yat-sen University.
      Teaching Assistant
      Taught by Prof. Alan L. Yuille from UCLA, @Sun Yat-sen University.
      Summer Intensive Course, all in English.
      Teaching Assistant
CONTRACT ME
Address: Room 517, Daoyuan Building, The Chinese Univeristy of Hong Kong, Shenzhen
E-mail: ruimao.zhang@ieee.org or zhangruimao@cuhk.edu.cn
Phone: (0755)23517042