I am pursuing a Ph.D. degree at Nanjing University under the supervision of Professor Qing Gu (顾庆) and Assistant Professor Zhiwei Jiang (蒋智威). Additionally, I am currently a visiting Ph.D. student at Singapore Management University (SMU), where I am guided by Associate Professor Qianru Sun and Assistant Professor Jiannan Li, with funding from the China Scholarship Council (CSC).
I have a broad interest in computer vision and deep learning, with a current focus on controllable and consistent generation in AIGC, including audio-driven video generation and text-to-image generation.
From May 2023 to May 2024, I was a research intern at Tencent AI Lab, where I worked under the mentorship of Kuan Tian (田宽) and Jun Zhang (张军), concentrating on research in AIGC.
Research Experience
- 2024.10 – Present, Visiting Ph.D. Student,
School of Computing and Information Systems (SCIS), Singapore Management University (SMU), Singapore.
- 2021.09 – Present, Ph.D. Candidate,
School of Computer Science, Nanjing University (NJU), Nanjing, China.
- 2023.05 – 2024.05, Research Intern,
Tencent AI Lab, Technology & Engineering Group (TEG), Tencent, Shenzhen, China.
Honors and Awards
- Outstanding Graduate Student, Nanjing University, 2023.
- Huawei Scholarship, Nanjing University, 2023.
- Yingcai Scholarship, Nanjing University, 2021.
Selected Publications
V-Express: Conditional Dropout for Progressive Training of Portrait Video Generation;
Cong Wang*,
Kuan Tian*,
Jun Zhang†,
Yonghang Guan,
Feng Luo,
Fei Shen,
Zhiwei Jiang†,
Qing Gu,
Xiao Han,
Wei Yang;
arXiv:2406.02511.
[code]
[project page]
[arXiv]
[models]
TL;DR: V-Express aims to generate a talking head video under the control of a reference image, an audio, and a sequence of V-Kps images.
Ensembling Diffusion Models via Adaptive Feature Aggregation;
Cong Wang*,
Kuan Tian*,
Yonghang Guan,
Jun Zhang†,
Zhiwei Jiang†,
Fei Shen,
Xiao Han,
Qing Gu,
Wei Yang;
arXiv:2405.17082.
[code]
[arXiv]
TL;DR: We propose Adaptive Feature Aggregation (AFA) to ensemble multiple diffusion models dynamically based on different states like prompts, noises, and spatial locations.
Aggregating Multiple Heuristic Signals as Supervision for Unsupervised Automated Essay Scoring;
Cong Wang,
Zhiwei Jiang†,
Yafeng Yin,
Zifeng Cheng,
Shiping Ge,
Qing Gu;
Annual Meeting of the Association for Computational Linguistics (ACL), 2023.
[paper]
[code]
[poster]
[slides]
[video]
TL;DR: We propose ULRA for unsupervised automated essay scoring, which utilizes multiple heuristic quality signals to train a neural network using Deep Pairwise Rank Aggregation loss.
Controlling Class Layout for Deep Ordinal Classification via Constrained Proxies Learning;
Cong Wang,
Zhiwei Jiang†,
Yafeng Yin,
Zifeng Cheng,
Shiping Ge,
Qing Gu;
AAAI Conference on Artificial Intelligence (AAAI), 2023.
[paper]
[code]
[poster]
[slides]
[arXiv]
TL;DR: We propose Constrained Proxies Learning for deep ordinal classification, which learns proxies for ordinal classes and adjusts their layout in feature space to capture ordinal relationships.
All Publications
Preprints
- A Debiased Nearest Neighbors Framework for Multi-Label Text Classification; Z. Cheng, Z. Jiang†, Y. Yin, Z. Chen, C. Wang, S. Ge, Q. Huang, Q. Gu; arXiv:2408.03202.
- V-Express: Conditional Dropout for Progressive Training of Portrait Video Generation; C. Wang*, K. Tian*, J. Zhang†, Y. Guan, F. Luo, F. Shen, Z. Jiang†, Q. Gu, X. Han, W. Yang; arXiv:2406.02511.
- Ensembling Diffusion Models via Adaptive Feature Aggregation; C. Wang*, K. Tian*, Y. Guan, J. Zhang†, Z. Jiang†, F. Shen, X. Han, Q. Gu, W. Yang; arXiv:2405.17082.
2025
- IMAGDressing-v1: Customizable Virtual Dressing; F. Shen, X. Jiang, X. He, H. Ye, C. Wang, X. Du, Z. Li, J. Tang†; AAAI Conference on Artificial Intelligence (AAAI).
- Boosting Consistency in Story Visualization with Rich-Contextual Conditional Diffusion Models; F. Shen, H. Ye, S. Liu, J. Zhang†, C. Wang, X. Han, W. Yang; AAAI Conference on Artificial Intelligence (AAAI).
2024
- AP-Adapter: Improving Generalization of Automatic Prompts on Unseen Text-to-Image Diffusion Models; Y. Fu, Z. Jiang†, Y. Liu, C. Wang, Z. Deng, Z. Chen, Q. Gu; Annual Conference on Neural Information Processing Systems (NeurIPS).
- Advancing Pose-Guided Image Synthesis with Progressive Conditional Diffusion Models; F. Shen*, H. Ye*, J. Zhang†, C. Wang, X. Han, W. Yang; International Conference on Learning Representations (ICLR).
2023
- Learning Event-Specific Localization Preferences for Audio-Visual Event Localization; S. Ge, Z. Jiang†, Y. Yin, C. Wang, Z. Cheng, Q. Gu; ACM International Conference on Multimedia (MM).
- Aggregating Multiple Heuristic Signals as Supervision for Unsupervised Automated Essay Scoring; C. Wang, Z. Jiang†, Y. Yin, Z. Cheng, S. Ge, Q. Gu; Annual Meeting of the Association for Computational Linguistics (ACL).
- Unsupervised Readability Assessment via Learning from Weak Readability Signals; Y. Liu, Z. Jiang†, Y. Yin, C. Wang, S. Chen, Z. Chen, Q. Gu; International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR).
- Learning Robust Multi-Modal Representation for Multi-Label Emotion Recognition via Adversarial Masking and Perturbation; S. Ge, Z. Jiang†, Z. Cheng, C. Wang, Y. Yin, Q. Gu; The ACM Web Conference (WWW).
- Controlling Class Layout for Deep Ordinal Classification via Constrained Proxies Learning; C. Wang, Z. Jiang†, Y. Yin, Z. Cheng, S. Ge, Q. Gu; AAAI Conference on Artificial Intelligence (AAAI).
2022
- A Consistent Dual-MRC Framework for Emotion-Cause Pair Extraction; Z. Cheng, Z. Jiang†, Y. Yin, C. Wang, S. Ge, Q. Gu; ACM Transactions on Information Systems (TOIS).
- Learning to Classify Open Intent via Soft Labeling and Manifold Mixup; Z. Cheng, Z. Jiang†, Y. Yin, C. Wang, Q. Gu; IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP).
* denotes equal contribution. † denotes the corresponding author.
Academic Services
- Journal Reviewer: TNNLS, TOMM;
- Conference Reviewer: ICLR (25), ICIC (24), MM (23), EMNLP (23).