About Me
My name is Zhenhua Yang (杨振华, Yeung Chenwa), I’m a second-year Master’s student from SCUT-DLVCLab in School of Electronic and Information Engineering, South China University of Technology, supervised by Prof. Lianwen Jin. I received my Bachelor degree from School of Automation Science and Engineering, South China University of Technology in 2022.
My research interests are focused on Diffusion Model, Image/Video Generation, and Document Restoration. I am also devoted into the open source community.
I am enthusiastic about discussing with different people. If you are interested, please feel free to $\color{#FF00FF}{contact\ me}$!
GitHub / Google Scholar / Email / Zhihu / Linkin
News
∙ [05/2024] Our paper UPOCR is accepted by ICML 2024 🎉🎉🎉.
∙ [12/2023] 🔥🔥🔥 The 📺Hugging Face Demo and the 🧑💻Github Repository of FontDiffuser is released! Welcome to check it out.
∙ [12/2023] 🎉 The paper FontDiffuser is accepted by AAAI2024, which excels in complex character generation and large style variation. The code and demo will be released soon.
∙ [12/2023] Our paper UPOCR is released to arXiv.
Education
South China University of Technology
Sep. 2022 - Present
M.S student at SCUT-DLVCLab in School of Electronic and Information Engineering
South China University of Technology
Sep. 2018 - Jun. 2022
B.E student in School of Automation Science and Engineering
Publications
FontDiffuser: One-Shot Font Generation via Denoising Diffusion with Multi-Scale Content Aggregation and Style Contrastive Learning
Zhenhua Yang, Dezhi Peng, Yuxin Kong, Yuyi Zhang, Cong Yao, Lianwen Jin†
Proceedings of the AAAI conference on artificial intelligence (AAAI), 2024
UPOCR: Towards Unified Pixel-Level OCR Interface
Dezhi Peng*, Zhenhua Yang*, Jiaxin Zhang, Chongyu Liu, Yongxin Shi, Lianwen Jin†
International Conference on Machine Learning (ICML), 2024
Open-Source Projects
Optical Character Recognition with Segment Anything (OCR-SAM)
Zhenhua Yang, Qing Jiang
Can SAM be applied to OCR? We take a simple try to combine two off-the-shelf OCR models in MMOCR with SAM to develop some OCR-related application demos, including SAM for Text, Text Removal and Text Inpainting. And we also provide a WebUI by gradio to give a better interaction.
FontDiffuser: One-Shot Font Generation via Denoising Diffusion
Zhenhua Yang
We propose FontDiffuser, which is capable to generate unseen characters and styles, and it can be extended to the cross-lingual generation, such as Chinese to Korean.
Recommendations of Diffusion for Text-Image
Zhenhua Yang
A paper collection of recent diffusion models for text-image generation tasks, e,g., visual text generation, font generation, text removal, text image super resolution, text editing, handwritten generation, scene text recognition and scene text detection.
Award
- Shenzhen HighPower Technology Scholarship, 2022. (Top 2%)
- First-Class Campus Scholarship, 2021. (Top 5%)
- Second-Class Campus Scholarship, 2020. (Top 10%)
- American Mathematical Contest in Modeling, Meritorious Prize, 2020
- Alibaba Tianchi Competition of Tile Defeat Detection, Top 1.2%, 2021
Blogs
SAM(Segment-Anything)在OCR文本图像领域的可视化效果及简单分析
2020年美赛心得
Mics
Hobby: Love a lot of sports, like Fishing🎣, Swimming🏊♂️, Riding Car🚲, Table tennis🎱🏓, Ball🏀⚽️, Badminton🏸 and Singing🎤. I am learning to play the piano🎹 currently.
Game Award: Our college team won the first-place in campus basketball games🏀🏆 twice when I was an undergradauate, spending the wonderfull time in my life.
Languange: Chinese, English, Cantonese, and Hakka.
Habit: A heavy coffee drinker ☕️~