Fangfu Liu

Fangfu Liu | 刘芳甫

I'm a third-year PhD student in the Department of Electronic Engineering at Tsinghua University , advised by Prof. Yueqi Duan. In 2023, I obtained my B.Eng. in the Department of Electronic Engineering, Tsinghua University.

My research interest lies in Machine Learning (e.g., Causal Learning) and Computer Vision (e.g., 3D AIGC and Video Generation). I aim to build spatially intelligent AI that can model the world and reason about objects, places, and interactions in 3D space and time.

If you are interested in working with us (in person or remotely) as an intern at Tsinghua University, please feel free to drop me an email.

Email / CV / Google Scholar / Github / Twitter

News

2025-06: Four papers on 3D and Video Generation are accepted by ICCV 2025.

2025-02: Two papers on 4D Dynamics and Video Generation are accepted by CVPR 2025.

2024-09: Two papers on 3D Vision are accepted by NeurIPS 2024.

2024-07: Three papers on 3D AIGC are accepted by ECCV 2024.

2024-02: One paper on 3D AIGC is accepted by CVPR 2024.

2023-05: One paper on Structure Learning is accepted by KDD 2023.

2023-02: One paper on NeRF is accepted by CVPR 2023.

2023-01: One paper on Causal Discovery is accepted by ICLR 2023.

Preprints

*Equal contribution ^†Project leader

	Spatial-MLLM: Boosting MLLM Capabilities in Visual-based Spatial Intelligence Diankun Wu, Fangfu Liu, Yi-Hsin Hung, Yueqi Duan Arxiv, 2025 [arXiv] [Code] [Project Page] In this paper, we present Spatial-MLLM, a novel framework for visual-based spatial reasoning from purely 2D observations. Unlike conventional video MLLMs which rely on CLIP-based visual encoders optimized for semantic understanding, our key insight is to unleash the strong structure prior from the feed-forward visual geometry foundation model.
	ReconX: Reconstruct Any Scene from Sparse Views with Video Diffusion Model Fangfu Liu, Wenqiang Sun , Hanyang Wang* , Yikai Wang , Haowen Sun , Junliang Ye , Jun Zhang , Yueqi Duan Arxiv, 2024 [arXiv] [Code] [Project Page] In this paper, we propose ReconX, a novel 3D scene reconstruction paradigm that reframes the ambiguous reconstruction challenge as a temporal generation task. The key insight is to unleash the strong generative prior of large pre-trained video diffusion models for sparse-view reconstruction.
	Physics3D: Learning Physical Properties of 3D Gaussians via Video Diffusion Fangfu Liu, Hanyang Wang Shunyu Yao , Shengjun Zhang, Jie Zhou, Yueqi Duan Arxiv, 2024 [arXiv] [Code] [Project Page] In this paper, we propose Physics3D, a novel method for learning various physical properties of 3D objects through a video diffusion model. Our approach involves designing a highly generalizable physical simulation system based on a viscoelastic material model, which enables us to simulate a wide range of materials with high-fidelity capabilities.

Selected Publications

*Equal contribution ^†Project leader

	LangScene-X: Reconstruct Generalizable 3D Language-Embedded Scenes with TriMap Video Diffusion Fangfu Liu, Hao Li , Jiawei Chi , Hanyang Wang , Minghui Yang, Fudong Wang, Yueqi Duan IEEE International Conference on Computer Vision (ICCV), 2025 [arXiv] [Code] [Project Page] In this paper, we introduce a novel generative framework, coined LangScene-X, to unify and generate 3D consistent multi-modality information for reconstruction and understanding.
	Video-T1: Test-Time Scaling for Video Generation Fangfu Liu, Hanyang Wang , Yimo Cai, Kaiyan Zhang , Xiaohang Zhan , Yueqi Duan IEEE International Conference on Computer Vision (ICCV), 2025 [arXiv] [Code] [Project Page] We present the generative effects and performance improvements of video generation under test-time scaling (TTS) settings. The videos generated with TTS are of higher quality and more consistent with the prompt than those generated without TTS.
	DimensionX: Create Any 3D and 4D Scenes from a Single Image with Controllable Video Diffusion Wenqiang Sun* , Shuo Chen* , Fangfu Liu, Zilong Chen , Yueqi Duan , Jun Zhang , Yikai Wang IEEE International Conference on Computer Vision (ICCV)*, 2025 [arXiv] [Code] [Project Page] In this paper, we introduce DimensionX, a framework designed to generate photorealistic 3D and 4D scenes from just a single image with video diffusion. We believe that our research provides a promising direction to create a dynamic and interactive environment with video diffusion models.
	VideoScene: Distilling Video Diffusion Model to Generate 3D Scenes in One Step Hanyang Wang* , Fangfu Liu, Jiawei Chi, Yueqi Duan IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2025 Highlight Paper* [arXiv] [Code] [Project Page] In this paper, we propose VideoScene to distill the video diffusion model to generate 3D scenes in one step, aiming to build an efficient and effective tool to bridge the gap from video to 3D.
	4D-Fly: Fast 4D Reconstruction from a Single Monocular Video Diankun Wu , Fangfu Liu, Yi-Hsin Hung, Yue Qian, Xiaohang Zhan , Yueqi Duan IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2025 [arXiv] [Code] [Project Page] In this work, we propose 4D-Fly for fast reconstructing 4D scenes from monocular videos in minutes. Compared to previous methods, our approach achieves 20x speed-up while maintaining comparable or superior reconstruction quality.
	Unique3D: High-Quality and Efficient 3D Mesh Generation from a Single Image Kailu Wu , Fangfu Liu, Zhihan Cai, Runjie Yan, Hanyang Wang, Yating Hu, Yueqi Duan , Kaisheng Ma Conference on Neural Information Processing Systems (NeurIPS), 2024 [arXiv] [Code] [Project Page] In this work, we introduce Unique3D, a novel image-to-3D framework for efficiently generating high-quality 3D meshes from single-view images, featuring state-of-the-art generation fidelity and strong generalizability. Unique3D can generate a high-fidelity textured mesh from a single orthogonal RGB image of any object in under 30 seconds.
	Gaussian Graph Network: Learning Efficient and Generalizable Gaussian Representations from Multi-view Images Shengjun Zhang , Xin Fei, Fangfu Liu, Haixu Song, Yueqi Duan Conference on Neural Information Processing Systems (NeurIPS), 2024 [arXiv] [Code] [Project Page] In this paper, we propose Gaussian Graph Network (GGN) to generate efficient and generalizable Gaussian representations. Specifically, we construct Gaussian Graphs to model the relations of Gaussian groups from different views. Compared to the state-of-the-art methods, our model uses fewer Gaussians and achieves better image quality with higher rendering speed.
	DreamReward: Text-to-3D Generation with Human Preference Junliang Ye* , Fangfu Liu, Qixiu Li, Zhengyi Wang , Yikai Wang , Xinzhou Wang, Yueqi Duan , Jun Zhu European Conference on Computer Vision (ECCV)*, 2024 [arXiv] [Code] [Project Page] In this work, We propose the first general-purpose human preference reward model for text-to-3D generation, named Reward3D. Then we further introduce a novel text-to-3D framework, coined DreamReward, which greatly boosts high-text alignment and high-quality 3D generation through human preference feedback.
	Make-Your-3D: Fast and Consistent Subject-Driven 3D Content Generation Fangfu Liu, Hanyang Wang, Weiliang Chen, Haowen Sun, Yueqi Duan European Conference on Computer Vision (ECCV), 2024 [arXiv] [Code] [Project Page] We introduce a novel 3D customization method, dubbed Make-Your-3D that can personalize high-fidelity and consistent 3D content from only a single image of a subject with text description within 5 minutes.
	Sherpa3D: Boosting High-Fidelity Text-to-3D Generation via Coarse 3D Prior Fangfu Liu, Diankun Wu, Yi Wei , Yongming Rao , Yueqi Duan IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024 [arXiv] [Code] [Project Page] We propose Sherpa3D, a new text-to-3D framework that achieves high-fidelity, generalizability, and geometric consistency simultaneously. Extensive experiments show the superiority of our Sherpa3D over the state-of-the-art text-to-3D methods in terms of quality and 3D consistency.
	Discovering Dynamic Causal Space for DAG Structure Learning Fangfu Liu, Wenchang Ma, An Zhang , Xiang Wang , Yueqi Duan , Tat-Seng Chua ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD), 2023 Oral Presentation [arXiv] [Code] [Project Page] we propose a dynamic causal space for DAG structure learning, coined CASPER, that integrates the graph structure into the score function as a new measure in the causal space to faithfully reflect the causal distance between estimated and groundtruth DAG.
	Semantic Ray: Learning a Generalizable Semantic Field with Cross-Reprojection Attention Fangfu Liu, Chubin Zhang, Yu Zheng, Yueqi Duan IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023 [arXiv] [Code] [Project Page] We propose a neural semantic representation called Semantic-Ray (S-Ray) to build a generalizable semantic field, which is able to learn from multiple scenes and directly infer semantics on novel viewpoints across novel scenes.
	Boosting Differentiable Causal Discovery via Adaptive Sample Reweighting An Zhang, Fangfu Liu, Wenchang Ma, Zhibo Cai, Xiang Wang , Tat-Seng Chua International Conference on Learning Representations (ICLR), 2023 [arXiv] [Code] [Project Page] We propose ReScore, a simple-yet-effective model-agnostic optimzation framework that simultaneously eliminates spurious edge learning and generalizes to heterogeneous data by utilizing learnable adaptive weights.