One Model to Rig Them All:
Diverse Skeleton Rigging with Unirig

1Tsinghua University, 2VAST

Corresponding Author
Diverse 3D models rigged using UniRig . The models, spanning various categories including animals, humans, and fictional characters, demonstrate the versatility of our method. Selected models are visualized with their predicted skeletons.

Abstract

The rapid evolution of 3D content creation, encompassing both AI-powered methods and traditional workflows, is driving an unprecedented demand for automated rigging solutions that can keep pace with the increasing complexity and diversity of 3D models. We introduce UniRig, a novel, unified framework for automatic skeletal rigging that leverages the power of large autoregressive models and a bone-point cross-attention mechanism to generate both high-quality skeletons and skinning weights. Unlike previous methods that struggle with complex or non-standard topologies, UniRig accurately predicts topologically valid skeleton structures thanks to a new Skeleton Tree Tokenization method that efficiently encodes hierarchical relationships within the skeleton.
To train and evaluate UniRig, we present Rig-XL, a new large-scale dataset of over 14,000 rigged 3D models spanning a wide range of categories. UniRig significantly outperforms state-of-the-art academic and commercial methods, achieving a 215% improvement in rigging accuracy and a 194% improvement in motion accuracy on challenging datasets. Our method works seamlessly across diverse object categories, from detailed anime characters to complex organic and inorganic structures, demonstrating its versatility and robustness. By automating the tedious and time-consuming rigging process, UniRig has the potential to speed up animation pipelines with unprecedented ease and efficiency.

pipeline image

In the image above, we showcase our new large-scale dataset, Rig-XL, which contains 14,000 3D models with diverse categories, illustrating the distribution of different types of skeletons and the distribution of the number of bones.

pipeline image

We show the overview of the UniRig framework above. The framework consists of two main stages: (a) Skeleton Tree Prediction and (b) Skin Weight Prediction. (a) The skeleton prediction stage takes a point cloud sampled from the 3D meshes as input, which is first processed by the Shape Encoder to extract geometric features. These features, along with optional class information, are then fed into an autoregressive Skeleton Tree GPT to generate a token sequence representing the skeleton tree. The token sequence is then decoded into a hierarchical skeleton structure. (b) The skin weight prediction stage takes the predicted skeleton tree from (a) and the point cloud as input. A Point-wise Encoder extracts features from the point cloud, while a Bone Encoder processes the skeleton tree. These features are then combined using a Bone-Point Cross Attention mechanism to predict the skinning weights and bone attributes. Finally, the predicted rig can be used to animate the mesh.

Results

result image

Here we show the results of UniRig on the validation dataset.

Animation Results

Slide left or right to view skeletons under different animation.

Left Side

Animation Mesh

Right Side

Animation Mesh with Skeleton

Slide to view skeletons
Bear
Slide to view skeletons
Bird
Slide to view skeletons
Carrot
Slide to view skeletons
Demon
Slide to view skeletons
Dragon
Slide to view skeletons
Fish
Slide to view skeletons
Giraffe
Slide to view skeletons
Rabbit
Slide to view skeletons
Snipper

Acknowledgements

Thanks very much to many friends for their unselfish help with our work. I'm extremely grateful to Yuanchen, Yingtian, and Yuan Liang for their guidance on code details and ideas, as well as to Tira for his appearance on teaser and video :)