Automatic font generation is an imitation task, which aims to create a font library that mimics the style of reference images while preserving the content from source images. Although existing font generation methods have achieved satisfactory performance, they still struggle with complex characters and large style variations. To address these issues, we propose FontDiffuser, a diffusion-based image-to-image one-shot font generation method, which innovatively models the font imitation task as a noise-to-denoise paradigm. In our method, we introduce a Multi-scale Content Aggregation (MCA) block, which effectively combines global and local content cues across different scales, leading to enhanced preservation of intricate strokes of complex characters. Moreover, to better manage the large variations in style transfer, we propose a Style Contrastive Refinement (SCR) module, which is a novel structure for style representation learning. It utilizes a style extractor to disentangle styles from images, subsequently supervising the diffusion model via a meticulously designed style contrastive loss. Extensive experiments demonstrate FontDiffuser’s state-of-the-art performance in generating diverse characters and styles. It consistently excels on complex characters and large style changes compared to previous methods.
Overview of our proposed method. (a) The Conditional Diffusion model is a UNet-based network composed of a content encoder Ec and a style encoder Es. The reference image Xs is passed through a style encoder Es and a content encoder Ec respectively, obtaining a style embedding e and structure maps Fs. The source image is encoded by a content encoder Ec. To obtain multi-scale features Fc, we derive output from the different layers of Ec and inject each of them through our proposed MCA block. RSI block is employed to conduct spatial deformation from reference structural features Fs. (b) The Style Contrastive Refinement module is to disentangle different styles from images and provide guidance to the diffusion model.
@inproceedings{yang2024fontdiffuser,
title={FontDiffuser: One-Shot Font Generation via Denoising Diffusion with Multi-Scale Content Aggregation and Style Contrastive Learning},
author={Yang, Zhenhua and Peng, Dezhi and Kong, Yuxin and Zhang, Yuyi and Yao, Cong and Jin, Lianwen},
booktitle={Proceedings of the AAAI conference on artificial intelligence},
year={2024}
}