About Me

My name is Zhenhua Yang (杨振华, Yeung Chenwa), I’m a second-year Master’s student from SCUT-DLVCLab in School of Electronic and Information Engineering, South China University of Technology, supervised by Prof. Lianwen Jin. I received my Bachelor degree from School of Automation Science and Engineering, South China University of Technology in 2022.

I have interned at International Digital Economy Academy (IDEA) in summer 2024, supervised by Prof. Lei Zhang and closely work with Dr. Hao Zhang. (Wonderful journey I have experienced)

My research interests are focused on AIGC, Generative Model, Video Understanding, Vision Language Model, and Document Understanding. I am also devoted into the open source community.

I am enthusiastic about discussing with different people. If you are interested, please feel free to $\color{#FF00FF}{contact\ me}$!

GitHub / Google Scholar / Email / Zhihu / Linkin

News

[07/2024] I will attend to ICML 2024 conference in person in Vienna, Austria. Open to have a disscussion or play with you. 🌹🌹🌹
[06/2024] Now I am interned at International Digital Economy Academy (IDEA), supervised by Prof. Lei Zhang and closely work with Dr. Hao Zhang, where I am working on the topic of vision-language large model for video understanding.
[05/2024] Our paper UPOCR is accepted by ICML 2024 🎉🎉🎉.
[12/2023] 🔥🔥🔥 The 📺Hugging Face Demo and the 🧑‍💻Github Repository of FontDiffuser is released! Welcome to check it out.
[12/2023] 🎉 The paper FontDiffuser is accepted by AAAI2024, which excels in complex character generation and large style variation. The code and demo will be released soon.
[12/2023] Our paper UPOCR is released to arXiv.

Education

South China University of Technology

Sep. 2022 - Present
M.S student at SCUT-DLVCLab in School of Electronic and Information Engineering


South China University of Technology

Sep. 2018 - Jun. 2022
B.E student in School of Automation Science and Engineering

Experience

International Digital Economy Academy (IDEA) - CVR

Jun. 2024 - Sep. 2024
Research Intern
Streaming Video Captioning and Understanding / Region Caption, supervised by Prof. Lei Zhang.


INTSIG - CamScanner

Mar. 2024 - May 2024
Engineering Intern
Editing documents in real-world scenarios.

Publications

Predicting the Original Appearance of Damaged Historical Documents

Zhenhua Yang*, Dezhi Peng*, Yongxin Shi, Yuyi Zhang, Chongyu Liu, Lianwen Jin
Preprint 2024


FontDiffuser: One-Shot Font Generation via Denoising Diffusion with Multi-Scale Content Aggregation and Style Contrastive Learning

Zhenhua Yang, Dezhi Peng, Yuxin Kong, Yuyi Zhang, Cong Yao, Lianwen Jin
Proceedings of the AAAI conference on artificial intelligence (AAAI), 2024


UPOCR: Towards Unified Pixel-Level OCR Interface

Dezhi Peng*, Zhenhua Yang*, Jiaxin Zhang, Chongyu Liu, Yongxin Shi, Lianwen Jin
International Conference on Machine Learning (ICML), 2024


HierCode: A Lightweight Hierarchical Codebook for Zero-shot Chinese Text Recognition

Yuyi Zhang, Yuanzhi Zhu, Dezhi Peng, Peirong Zhang, Zhenhua Yang, Lianwen Jin
Pattern Recognition (PR), 2024


Censoring-aware deep ordinal regression for survival prediction from pathological images

Lichao Xiao, Jin-Gang Yu, Zhifeng Liu, Jiarong Ou, Shule Deng, Zhenhua Yang, Yuanqing Li
Medical Image Computing and Computer Assisted Intervention, (MICCAI), 2020

Open-Source Projects

Optical Character Recognition with Segment Anything (OCR-SAM)

Zhenhua Yang, Qing Jiang
Can SAM be applied to OCR? We take a simple try to combine two off-the-shelf OCR models in MMOCR with SAM to develop some OCR-related application demos, including SAM for Text, Text Removal and Text Inpainting. And we also provide a WebUI by gradio to give a better interaction.


FontDiffuser: One-Shot Font Generation via Denoising Diffusion

Zhenhua Yang
We propose FontDiffuser, which is capable to generate unseen characters and styles, and it can be extended to the cross-lingual generation, such as Chinese to Korean.


Recommendations of Diffusion for Text-Image

Zhenhua Yang
A paper collection of recent diffusion models for text-image generation tasks, e,g., visual text generation, font generation, text removal, text image super resolution, text editing, handwritten generation, scene text recognition and scene text detection.


Recommendations of Document Image Processing

Jiaxin Zhang, Zhenhua Yang
A paper collection of the methods for document image processing, including appearance enhancement, deshadow, dewarping, deblur, and binarization.


Award

  • Shenzhen HighPower Technology Scholarship, 2022. (Top 2%)
  • First-Class Campus Scholarship, 2021. (Top 5%)
  • Second-Class Campus Scholarship, 2020. (Top 10%)
  • American Mathematical Contest in Modeling, Meritorious Prize, 2020
  • Alibaba Tianchi Competition of Tile Defeat Detection, Top 1.2%, 2021

Blogs

SAM(Segment-Anything)在OCR文本图像领域的可视化效果及简单分析
2020年美赛心得

Mics

Hobby: Love a lot of sports, like Fishing🎣, Swimming🏊‍♂️, Riding Car🚲, Table tennis🎱🏓, Ball🏀⚽️, Badminton🏸 and Singing🎤. I am learning to play the piano🎹 currently.
Game Award: Our college team won the first-place in campus basketball games🏀🏆 twice when I was an undergradauate, spending the wonderfull time in my life.
Languange: Chinese, English, Cantonese, and Hakka.
Habit: A heavy coffee drinker ☕️~