Zhenhua Yang
My name is Zhenhua Yang (杨振华, Yeung Chenwa).
I am a Full-time Algorithm Engineer in Taobao & Tmall Group of Alibaba (2025). I received my Master's and Bachelor's degree from SCUT-DLVCLab [supervised by Prof. Lianwen Jin] in South China University of Technology.
Fortunately, I have interned at the Kling Team (advised by Xin Tao) and IDEA, Shenzhen (advised by Lei Zhang and Hao Zhang).
My research interests are focused on:
- Post-Training for Multi-modal Generation and Editing
- Unifying Understanding and Generation for Next Generation Model
- Visual Text Generation and Editting (Text Rendering)
News
- 05/2025 🎉🎉🎉 Our work AutoHDR is accepted by ACL 2025 Main. I am the project leader. Code released.
- 02/2025 🎉🎉🎉 Our paper HDR is selected as the Oral presentation.
- 12/2024 The inference code of our paper HDR is released at GitHub.
- 12/2024 Our paper HDR is accepted by AAAI 2025 🎉🎉🎉.
- 07/2024 I will attend the ICML 2024 conference in person in Vienna, Austria.
- 06/2024 Interning at IDEA, supervised by Prof. Lei Zhang, working on vision-language large models for video understanding.
- 05/2024 Our paper UPOCR is accepted by ICML 2024 🎉🎉🎉.
- 12/2023 🔥🔥🔥 The Hugging Face Demo and GitHub of FontDiffuser are released!
- 12/2023 🎉 Paper FontDiffuser is accepted by AAAI 2024.
- 12/2023 Our paper UPOCR is released to arXiv.
Education

South China University of Technology
Sep. 2022 - 2025
M.S. Student at SCUT-DLVCLab, School of Electronic and Information Engineering

South China University of Technology
Sep. 2018 - Jun. 2022
B.E. Student in School of Automation Science and Engineering
Experience

Alibaba - Taobao & Tmall Group
July 2025 - Present
Full-time AIGC Algorithm Engineer
Post-Training for Image Generation and Editing.

KuaiShou - Kling Team
Jan. 2025 - Apr. 2025
Research Intern, supervised by Xin Tao.
Topic: Unifying Understanding and Generation for Next Generation Model.

International Digital Economy Academy (IDEA) - CVR
Jun. 2024 - Sep. 2024
Research Intern, supervised by Prof. Lei Zhang.
Topic: MLLM for Streaming Video Narration and QA.
Selected Publications [Full List]





Draft, Verify, Restore: Self-Refining Historical Inscription Restoration with a Unified MLLM
Topic: Unify Content Reasoning (Understanding) and Generation
…, Zhenhua Yang* (Project Lead), et al., Lianwen Jin†
Submitted to Meeting of the Association for Computational Linguistics (ACL), 2026
Open-Source Projects
Focus: Exploring SAM's zero-shot generalization in OCR tasks. Integrated SAM with MMOCR to develop specialized application demos, including precise text removal and high-fidelity text inpainting with a Gradio-based WebUI.
Focus: A robust generative framework for few-shot font stylization. Capable of synthesizing unseen characters and complex styles, supporting cross-lingual generation (e.g., Chinese to Korean) via advanced diffusion denoising.
Focus: A curated academic collection of state-of-the-art diffusion models for text-centric visual tasks, including visual text generation, font synthesis, and scene text recognition.
Focus: A comprehensive survey of document image restoration techniques, covering appearance enhancement, deshadowing, dewarping, deblurring, and binarization.
Awards
- Shenzhen HighPower Technology Scholarship, 2022 (Top 2%)
- First-Class Campus Scholarship, 2021 (Top 5%)
- American Mathematical Contest in Modeling, Meritorious Prize, 2020
- Alibaba Tianchi Competition, Top 1.2%, 2021
Mics
Hobby: 🎣 Fishing, 🏊♂️ Swimming, 🚲 Riding, 🏓 Table tennis, 🏀 Basketball, 🎤 Singing. Learning 🎹 Piano.
Languages: Chinese (Mandarin, Cantonese, Hakka), English.
Habit: A heavy coffee drinker ☕.
