About Me

My name is Zhenhua Yang (杨振华, Yeung Chenwa), I’m a second-year Master’s student from SCUT-DLVCLab in School of Electronic and Information Engineering, South China University of Technology, supervised by Prof. Lianwen Jin. I received my Bachelor degree from School of Automation Science and Engineering, South China University of Technology in 2022.

My research interests are focused on Diffusion Model, Image/Video Generation, and Document Restoration. I am also devoted into the open source community.

I am enthusiastic about discussing with different people. If you are interested, please feel free to $\color{#FF00FF}{contact\ me}$!

GitHub / Google Scholar / Email / Zhihu / Linkin

News

[05/2024] Our paper UPOCR is accepted by ICML 2024 🎉🎉🎉.
[12/2023] 🔥🔥🔥 The 📺Hugging Face Demo and the 🧑‍💻Github Repository of FontDiffuser is released! Welcome to check it out.
[12/2023] 🎉 The paper FontDiffuser is accepted by AAAI2024, which excels in complex character generation and large style variation. The code and demo will be released soon.
[12/2023] Our paper UPOCR is released to arXiv.

Education

South China University of Technology

Sep. 2022 - Present
M.S student at SCUT-DLVCLab in School of Electronic and Information Engineering


South China University of Technology

Sep. 2018 - Jun. 2022
B.E student in School of Automation Science and Engineering

Publications

FontDiffuser: One-Shot Font Generation via Denoising Diffusion with Multi-Scale Content Aggregation and Style Contrastive Learning

Zhenhua Yang, Dezhi Peng, Yuxin Kong, Yuyi Zhang, Cong Yao, Lianwen Jin
Proceedings of the AAAI conference on artificial intelligence (AAAI), 2024

UPOCR: Towards Unified Pixel-Level OCR Interface

Dezhi Peng*, Zhenhua Yang*, Jiaxin Zhang, Chongyu Liu, Yongxin Shi, Lianwen Jin
International Conference on Machine Learning (ICML), 2024

Open-Source Projects

Optical Character Recognition with Segment Anything (OCR-SAM)

Zhenhua Yang, Qing Jiang
Can SAM be applied to OCR? We take a simple try to combine two off-the-shelf OCR models in MMOCR with SAM to develop some OCR-related application demos, including SAM for Text, Text Removal and Text Inpainting. And we also provide a WebUI by gradio to give a better interaction.


FontDiffuser: One-Shot Font Generation via Denoising Diffusion

Zhenhua Yang
We propose FontDiffuser, which is capable to generate unseen characters and styles, and it can be extended to the cross-lingual generation, such as Chinese to Korean.


Recommendations of Diffusion for Text-Image

Zhenhua Yang
A paper collection of recent diffusion models for text-image generation tasks, e,g., visual text generation, font generation, text removal, text image super resolution, text editing, handwritten generation, scene text recognition and scene text detection.

Award

  • Shenzhen HighPower Technology Scholarship, 2022. (Top 2%)
  • First-Class Campus Scholarship, 2021. (Top 5%)
  • Second-Class Campus Scholarship, 2020. (Top 10%)
  • American Mathematical Contest in Modeling, Meritorious Prize, 2020
  • Alibaba Tianchi Competition of Tile Defeat Detection, Top 1.2%, 2021

Blogs

SAM(Segment-Anything)在OCR文本图像领域的可视化效果及简单分析
2020年美赛心得

Mics

Hobby: Love a lot of sports, like Fishing🎣, Swimming🏊‍♂️, Riding Car🚲, Table tennis🎱🏓, Ball🏀⚽️, Badminton🏸 and Singing🎤. I am learning to play the piano🎹 currently.
Game Award: Our college team won the first-place in campus basketball games🏀🏆 twice when I was an undergradauate, spending the wonderfull time in my life.
Languange: Chinese, English, Cantonese, and Hakka.
Habit: A heavy coffee drinker ☕️~