Make-Your-3D: Fast and Consistent Subject-Driven 3D Content Generation

Fangfu Liu, Hanyang Wang, Weiliang Chen, Haowen Sun, Yueqi Duan
Tsinghua University

Make-Your-3D is a new subject-driven 3D generation framework that can personalize your 3D assets from only a wild image.

Abstract

Recent years have witnessed the strong power of 3D generation models, which offer a new level of creative flexibility by allowing users to guide the 3D content generation process through a single image or natural language. However, it remains challenging for existing 3D generation methods to create subject-driven 3D content across diverse prompts. In this paper, we introduce a novel 3D customization method, dubbed Make-Your-3D that can personalize high-fidelity and consistent 3D content from only a single image of a subject with text description within 5 minutes. Our key insight is to harmonize the distributions of a multi-view diffusion model and an identity-specific 2D generative model, aligning them with the distribution of the desired 3D subject. Specifically, we design a co- evolution framework to reduce the variance of distributions, where each model undergoes a process of learning from the other through identity-aware optimization and subject-prior optimization, respectively. Extensive experiments demonstrate that our method can produce high-quality, consistent, and subject-specific 3D content with text-driven modifications that are unseen in subject image.

Figure 1. Make-Your-3D can personalize 3D contents from only a single image of a subject with text-driven modifications within only 5 minutes, which is 36 x faster than DreamBooth3D.


Subject-Driven 3D generation

Visual results of Make-Your-3D on different subjects with customized text inputs. The multi-view results demonstrate that our method can generate 3D assets with high-fidelity, 3D consistency, subject preservation, and faithfulness to the text prompts.


"Gelute" Personalization

Textured meshes and normal maps on a subject "Gelute" with various customized text inputs.


Qualitative Comparison

The qualitative comparisons with DreamBooth3D. We use the same text prompt and only one of the input images as in DreamBooth3D. Notice ours perform better on the object details with less input images.


Comparisons with the failure cases in DreamBooth3D. As Dreambooth3D fails to reconstruct thin object structures like sunglasses and suffers from limited view variation, Our method has made significant improvements in fine details of thin objects and fast 3D personalization from a single subject image.


Method

The overall framework of our proposed Make-Your-3D. Our framework includes identity-aware optimization of 2D personalized model and subject-prior optimization of multi-view diffusion model to approximate subject distribution. The identity-aware optimization lifts input image to 3D space through a frozen multi-view diffusion model and optimizes the 2D personalized model via multi-views. The subject-prior optimization adopts diverse images from frozen personalized model to infuse the subject-specific prior into the multi-view diffusion model.


BibTeX

@article{liu2024make,
      title={Make-Your-3D: Fast and Consistent Subject-Driven 3D Content Generation},
      author={Liu, Fangfu and Wang, Hanyang and Chen, Weiliang and Sun, Haowen and Duan, Yueqi},
      journal={arXiv preprint arXiv:2403.09625},
      year={2024}
    }