Steering
Following a target heading and target velocity. Side-steps and cross-stepping gaits arise when the two directions disagree.
ACM Transactions on Graphics · SIGGRAPH 2026
Reusable Score-Matching Motion Priors
for Physics-Based Character Control
Pretrained motion diffusion models — repurposed as reward models — to train policies for 100+ styles, diverse tasks, object interactions, and real-world robots.
A single frozen diffusion model serves as the reward model across many tasks and styles, from locomotion and steering, to dodgeball and zombie-walk.
A motion prior is trained once, independently of any task or policy, and serves as a reward model without ever accessing the original motion data.
A general pretrained motion prior can be shaped or composed to synthesize new styles not present in the original dataset without further training.
Data-driven motion priors that can guide agents toward producing naturalistic behaviors play a pivotal role in creating life-like virtual characters. Adversarial imitation learning has been a highly effective method for learning motion priors from reference motion data. However, adversarial priors, with few exceptions, need to be retrained for each new controller, thereby limiting their reusability and necessitating the retention of the reference motion data when applied to downstream tasks.
In this work, we present Score-Matching Motion Priors (SMP), which leverages pre-trained motion diffusion models and score distillation sampling (SDS) to create reusable task-agnostic motion priors. SMPs can be pre-trained on a motion dataset, independent of any control policy or task. Once trained, they can be frozen and reused as general-purpose reward functions to train policies to produce naturalistic behaviors for downstream tasks.
We show that a general motion prior trained on large-scale datasets can be repurposed into a variety of style-specific priors. Furthermore, SMP can compose different styles to synthesize new styles not present in the original dataset. Our method creates reusable and modular motion priors that produce high-quality motions comparable to state-of-the-art adversarial imitation learning methods. We demonstrate the effectiveness of SMP across a diverse suite of control tasks with physically simulated humanoid characters.
A motion diffusion model is trained on a reference motion dataset. During policy training, the pretrained diffusion model is repurposed as a reward model: it compares the noise added to the agent’s motion against the noise it predicts, and the residual evaluates how far the behavior is from the reference data manifold.
Instead of evaluating SDS at a single random diffusion timestep, SMP aggregates SDS losses over a fixed set of timesteps. This substantially reduces variance without introducing significant bias, therebystabilizing PPO training.
SDS magnitudes differ sharply across noise levels and can vary across diffusion model checkpoints. We normalize each timestep’s SDS error by its running mean, substantially reducing the burden of manual hyperparameter tuning.
The same diffusion model is sampled to generate initial states for training. This matches reference-state-initialization’s exploration benefits without retaining the original motion dataset.
We train a single style-conditioned diffusion model on the 20-hour 100STYLE dataset. Using classifier-free guidance, the same frozen model is reshaped into distinct style-specific priors — no style-specific datasets, no discriminator retraining. Diffusion model predictions conditioned on different styles can be composed to synthesize styles that don’t exist in the original dataset, such as Elated + FlickLegs and GracefulArms + Spin.
A single motion prior trained on the LaFAN1 locomotion dataset is reused to train policies for different tasks. The character automatically discovers gait transitions — backward jogs, side-steps, slow walks near targets — and, for dodgeball, develops agile jumping and dodging skills that do not explicitly appear in the reference data.
Following a target heading and target velocity. Side-steps and cross-stepping gaits arise when the two directions disagree.
Reaching 2D target positions. The character speeds up when far from the target and transitions smoothly into walking as it approaches the target.
Balls are launched at up to 25m/s from up to 10m away — less than 0.5s to react. Agile dodging skills resembling human-like strategies emerge purely from locomotion data.
SMP extends naturally to joint human–object priors by modelling character and object motions together. This enables complex tasks such as picking up a box and climbing stairs, with coordinated whole-body manipulation and stable foot contacts.
Walk to a box, pick it up, then carry it to an arbitrary target location. The prior jointly models character and object motions.
Using a human–scene interaction prior, the character ascends and descends a staircase with natural foot placement.
SMP works even when reference data is scarce. With only three seconds of walking, jogging, and running clips, the character learns to continuously modulate its gait across a wide speed range — developing transitions such as walk→jog→sprint that were never in the dataset.
An SMP-trained policy is deployed directly on a Unitree G1 humanoid. With asymmetric actor-critic training, domain randomization, and proprioception-only observations, the robot exhibits natural locomotion, robust recovery from external perturbations, and agile motion skills.
A complete walkthrough of the paper — background, motivation, method, and additional qualitative results across all task settings.
@article{mu2026smp,
title = {SMP: Reusable Score-Matching Motion Priors for Physics-Based Character Control},
author = {Mu, Yuxuan and Zhang, Ziyu and Shi, Yi and Yang, Dun and
Matsumoto, Minami and Imamura, Kotaro and Tevet, Guy and
Guo, Chuan and Taylor, Michael and Shu, Chang and
Xi, Pengcheng and Peng, Xue Bin},
journal = {ACM Transactions on Graphics (Proceedings of SIGGRAPH 2026)},
year = {2026},
}