Zhenhua Yang

My name is Zhenhua Yang (杨振华, Yeung Chenwa).

I am a Full-time Research Algorithm Engineer in Taobao & Tmall Group of Alibaba (2025). I received my Master's and Bachelor's degree from SCUT-DLVCLab [supervised by Prof. Lianwen Jin] in South China University of Technology.

Fortunately, I have interned at:

Kling Team (advised by Xin Tao, Rui Chen, Jiarong Ou)
IDEA, Shenzhen (advised by Lei Zhang and Hao Zhang)

My research interests are focused on:

Post-Training for Multi-modal Generation and Editing
Unifying Understanding and Generation for Next Generation Model
Visual Text Generation and Editting (Text Rendering)

GitHub | Google Scholar | Email | Zhihu | LinkedIn

News

04/2026 🎉🎉🎉 Our work UniHIR is accepted by ACL 2026 Main and I am the leader in this project.
02/2025 🎉🎉🎉 Our paper HDR is selected as the Oral presentation.
05/2025 🎉🎉🎉 Our work AutoHDR is accepted by ACL 2025 Main. I am the project leader. Code released.
02/2025 🎉🎉🎉 Our paper HDR is selected as the Oral presentation.
12/2024 The inference code of our paper HDR is released at GitHub.
12/2024 Our paper HDR is accepted by AAAI 2025 🎉🎉🎉.
07/2024 I will attend the ICML 2024 conference in person in Vienna, Austria.
06/2024 Interning at IDEA, supervised by Prof. Lei Zhang, working on vision-language large models for video understanding.
05/2024 Our paper UPOCR is accepted by ICML 2024 🎉🎉🎉.
12/2023 🔥🔥🔥 The Hugging Face Demo and GitHub of FontDiffuser are released!
12/2023 🎉 Paper FontDiffuser is accepted by AAAI 2024.
12/2023 Our paper UPOCR is released to arXiv.

Education

South China University of Technology

Sep. 2022 - 2025

M.S. Student at SCUT-DLVCLab, School of Electronic and Information Engineering

South China University of Technology

Sep. 2018 - Jun. 2022

B.E. Student in School of Automation Science and Engineering

Experience

Alibaba - Taobao & Tmall Group

July 2025 - Present

Full-time AIGC Algorithm Engineer

Topic: Post-Training for Image Generation and Editing.

KuaiShou - Kling Team

Jan. 2025 - Apr. 2025

Research Intern, supervised by Xin Tao.

Topic: Unifying Understanding and Generation for Next Generation Model.

International Digital Economy Academy (IDEA) - CVR

Jun. 2024 - Sep. 2024

Research Intern, supervised by Prof. Lei Zhang.

Topic: MLLM for Streaming Video Narration and QA.

Selected Publications [Full List]

Topic: Text Rendering: From Character-level to Document-level Generation and Editing.

Predicting the Original Appearance of Damaged Historical Documents

Zhenhua Yang, Dezhi Peng, Yongxin Shi, Yuyi Zhang, Chongyu Liu, Lianwen Jin†

Proc. of the AAAI Conference on Artificial Intelligence (AAAI Oral), 2025

FontDiffuser: One-Shot Font Generation via Denoising Diffusion with Multi-Scale Content Aggregation and Style Contrastive Learning

Zhenhua Yang, Dezhi Peng, Yuxin Kong, Yuyi Zhang, Cong Yao, Lianwen Jin†

Proc. of the AAAI Conference on Artificial Intelligence (AAAI), 2024

UPOCR: Towards Unified Pixel-Level OCR Interface

Dezhi Peng*, Zhenhua Yang* (Equal Contribution), Jiaxin Zhang, Chongyu Liu, Yongxin Shi, Lianwen Jin†

International Conference on Machine Learning (ICML), 2024

Reviving Cultural Heritage: A Novel Approach for Comprehensive Historical Document Restoration

Yuyi Zhang, Peirong Zhang, Zhenhua Yang* (Project Lead), et al., Lianwen Jin†

Meeting of the Association for Computational Linguistics (ACL Main), 2025

Draft, Verify, Restore: Self-Refining Historical Inscription Restoration with a Unified MLLM

Topic: Unify Content Reasoning (Understanding) and Generation

…, Zhenhua Yang* (Project Lead), et al., Lianwen Jin†

Meeting of the Association for Computational Linguistics (ACL Main), 2026

Open-Source Projects

Optical Character Recognition with Segment Anything (OCR-SAM)

Focus: Exploring SAM's zero-shot generalization in OCR tasks. Integrated SAM with MMOCR to develop specialized application demos, including precise text removal and high-fidelity text inpainting with a Gradio-based WebUI.

Flexible Diffusion-based Font Generation Framework

Focus: A robust generative framework for few-shot font stylization. Capable of synthesizing unseen characters and complex styles, supporting cross-lingual generation (e.g., Chinese to Korean) via advanced diffusion denoising.

Recommendations of Diffusion for Text-Image

Focus: A curated academic collection of state-of-the-art diffusion models for text-centric visual tasks, including visual text generation, font synthesis, and scene text recognition.

Recommendations of Document Image Processing

Focus: A comprehensive survey of document image restoration techniques, covering appearance enhancement, deshadowing, dewarping, deblurring, and binarization.

Awards

Shenzhen HighPower Technology Scholarship, 2022 (Top 2%)
First-Class Campus Scholarship, 2021 (Top 5%)
American Mathematical Contest in Modeling, Meritorious Prize, 2020
Alibaba Tianchi Competition, Top 1.2%, 2021

Mics

Hobby: 🎣 Fishing, 🏊‍♂️ Swimming, 🚲 Riding, 🏓 Table tennis, 🏀 Basketball, 🎤 Singing. Learning 🎹 Piano.
Languages: Chinese (Mandarin, Cantonese, Hakka), English.
Habit: A heavy coffee drinker ☕.