Yang Li

I'm a research scientist at Tencent (Hunyuan3D).

I finished my Ph.D. with Tatsuya Harada at The University of Tokyo. During my Ph.D., I did an internship at Technical University Munich with Matthias Nießner, and an internship with Bo Zheng at Huawei Japan Research Center. I did a master in bioinformatics with Tetsuo Shibuya at The University of Tokyo.

My research interests lie in the intersection of 3D computer vision, artificial intelligence, particularly focusing on registration, 3D/4D reconstruction, 3D Generative modeling, with applications in VR/AR, robotics, etc.

Email / Scholar / Github

Research

	BAG: Body-Aligned 3D Wearable Asset Generation Zhongjin Luo, Yang Li †, Mingrui Zhang, Senbo Wang, Han Yan, Xibin Song, Taizhang Shang, Wei Mao, Hongdong Li, Xiaoguang Han, Pan Ji Arxiv preprint 2025 Project Page\| Paper\| video We present BAG, a Body-aligned Asset Generation method to output 3D wearable asset that can be automatically dressed on given 3D human bodies.
	PhyCAGE: Physically Plausible Compositional 3D Asset Generation from a Single Image Han Yan, Mingrui Zhang, Yang Li †, Chao Ma, Pan Ji Arxiv preprint 2024 Project Page\| Paper\| video PhyCAGE generates physically plausible compositional 3D assets from a single image.
	LAM3D: Large Image-Point-Cloud Alignment Model for 3D Reconstruction from Single Image Ruikai Cui, Xibin Song, Weixuan Sun, Senbo Wang, Weizhe Liu, Shenzhou Chen, Taizhang Shang, Yang Li, Nick Barnes, Hongdong Li, Pan Ji NeurIPS 2024 Paper We introduce a novel framework, the Large Image and Point Cloud Alignment Model (LAM3D), which utilizes 3D point cloud data to enhance the fidelity of generated 3D meshes.
	Sketch2Scene: Automatic Generation of Interactive 3D Game Scenes from User's Casual Sketches Yongzhi Xu, Yonhon Ng, Yifu Wang, Inkyu Sa, Yunfei Duan, Yang Li, Pan Ji, Hongdong Li Arxiv preprint 2024 Project Page\| Paper\| video This paper proposes a novel approach for automatically generating interactive (i.e., playable) 3D game scenes from users' casual prompts, including hand-drawn sketches and text descriptions.
	Frankenstein: Generating Semantic-Compositional 3D Scenes in One Tri-Plane Han Yan, Yang Li †, Zhennan Wu, Shenzhou Chen, Weixuan Sun, Taizhang Shang, Weizhe Liu, Tian Chen, Xiaqiang Dai, Chao Ma, Hongdong Li, Pan Ji SIGGRAPH Asia 2024 Project Page\| Paper\| video\| code We present Frankenstein, a diffusion-based framework that can generate semantic-compositional 3D scenes in a single pass. Unlike existing methods that output a single, unified 3D shape, Frankenstein simultaneously generates multiple separated shapes, each corresponding to a semantically meaningful part.
	BlockFusion: Expandable 3D Scene Generation using Latent Tri-plane Extrapolation Zhennan Wu, Yang Li † , Han Yan, Taizhang Shang, Weixuan Sun, Senbo Wang, Ruikai Cui, Weizhe Liu, Hiroyuki Sato, Hongdong Li, and Pan Ji Transaction on Graphics 2024 (Selected as SIGGRAPH 2024 Trailer Video) Project Page\| Paper\| video\| Code We introduce the first 3D diffusion based approach for directly generating large unbounded 3D scene in both inddor and outdoor scenarios. At the core of this approach is a novel tri-plane diffusion and tri-plane extrapolation mechanism.
	NeuSDFusion: A Spatial-Aware Generative Model for 3D Shape Completion, Reconstruction, and Generation Ruikai Cui, Weizhe Liu, Weixuan Sun, Senbo Wang, Taizhang Shang, Yang Li , Xibin Song, Han Yan, Zhennan Wu, Shenzhou Chen, Hongdong Li, Pan Ji ECCV 2024 Project Page\| Paper In this pa- per, we introduce a novel spatial-aware 3D shape generation framework that leverages 2D plane representations for enhanced 3D shape modeling.
	3D Segmenter: 3D Transformer based Semantic Segmentation via 2D Panoramic Distillation Zhennan Wu, Yang Li , Yifei Huang, Lin Gu, Tatsuya Harada, and Hiroyuki Sato ICLR 2023 paper\| Code We propose the first 2D-to-3D knowledge distillation strategy to enhance 3D semantic segmentation model with knowledge embedded in the latent space of powerful 2D models.
	Non-rigid Point Cloud Registration with Neural Deformation Pyramid Yang Li and Tatsuya Harada NeurIPS 2022 paper\| Code Neural Deformation Pyramid (NDP) break down non-rigid point cloud registration problem via hierarchical motion decomposition. NDP demonstrates advantage in both speed and registration accuracy.
	Lepard: Learning partial point cloud matching in rigid and deformable scenes Yang Li and Tatsuya Harada CVPR 2022 (Oral Presentation) paper\| video\| Code We design Lepard, a novel partial point clouds matching method that exploits 3D positional knowledge. Lepard reaches SOTA on both rigid and deformable point cloud matching benchmarks.
	4DComplete: Non-Rigid Motion Estimation Beyond the Observable Surface Yang Li, Hiraki Takehara, Takafumi Taketomi, Bo Zheng, and Matthias Nießner ICCV 2021 paper\| video\| Code We introduce 4DComplete, the first method that jointly recovers the shape and motion field from partial observations. We also provide a large-scale non-rigid 4D dataset for training and benchmaring. It consists of 1,972 animation sequences, and 122,365 frames.
	SplitFusion: Simultaneous Tracking and Mapping for Non-Rigid Scenes Yang Li, Tianwei Zhang, Yoshihiko Nakamura, and Tatsuya Harada IROS 2020 paper\| video SplitFusion is a dense RGB-D SLAM framework that simultaneously performs tracking and dense reconstruction for both rigid and non-rigid components of the scene.
	Learning to Optimize Non-Rigid Tracking Yang Li, Aljaž Božič, Tianwei Zhang, Yanli Ji, Tatsuya Harada, and Matthias Nießner CVPR 2020 (Oral Presentation) paper \| video We learn the tracking of non-rigid objects by differentiating through the underlying non-rigid solver. Specifically, we propose ConditionNet which learns to generate a problem-specific preconditioner using a large number of training samples from the Gauss-Newton update equation. The learned preconditioner increases PCG’s convergence speed by a significant margin.
	FlowFusion: Dynamic Dense RGB-D SLAM Based on Optical Flow Tianwei Zhang, Huayan Zhang, Yang Li, Yoshihiko Nakamura, and Lei Zhang ICRA 2020 paper\| video We present a novel dense RGB-D SLAM solution that simultaneously accomplishes the dynamic/static segmentation and camera ego-motion estimation as well as the static background reconstructions.
	Pose graph optimization for Unsupervised monocular visual odometry Yang Li, Yoshitaka Ushiku, and Tatsuya Harada ICRA 2019 paper We propose to leverage graph optimization and loop closure detection to overcome limitations of unsupervised learning based monocular visual odometry.

This website is based on source code.