StableMotion: Training Motion Cleanup Models with Unpaired Corrupted Data

SIGGRAPH Asia 2025

Yuxuan Mu¹, Hung Yu Ling², Yi Shi¹, Ismael Baira Ojeda², Pengcheng Xi³, Chang Shu³, Fabio Zinno², Xue Bin Peng^1,4

¹Simon Fraser University ²Electronic Arts ³National Research Council Canada ⁴NVIDIA

TL;DR — 🧑‍🔧 You don’t need a clean dataset to train a motion cleanup model.
StableMotion learns to fix corrupted motions directly from raw mocap data — no handcrafted data pairs, no synthetic artifact augmentation.

Motion capture (mocap) data often exhibits visually jarring artifacts due to inaccurate sensors and post-processing. Cleaning this corrupted data can require substantial manual effort from human experts, which can be a costly and time-consuming process. Previous data-driven motion cleanup methods offer the promise of automating this cleanup process, but often require in-domain paired corrupted-to-clean training data. Constructing such paired datasets requires access to high-quality, relatively artifact-free motion clips, which often necessitates laborious manual cleanup. In this work, we present StableMotion, a simple yet effective method for training motion cleanup models directly from unpaired corrupted datasets that need cleanup. The core component of our method is the introduction of motion quality indicators, which can be easily annotated— through manual labeling or heuristic algorithms—and enable training of quality-aware motion generation models on raw motion data with mixed quality. At test time, the model can be prompted to generate high-quality motions using the quality indicators. Our method can be implemented through a simple diffusion-based framework, leading to a unified motion generate-discriminate model, which can be used to both identify and fix corrupted frames. We demonstrate that our proposed method is effective for training motion cleanup models on raw mocap data in production scenarios by applying StableMotion to SoccerMocap, a 245-hour soccer mocap dataset containing real-world motion artifacts. The trained model effectively corrects a wide range of motion artifacts, reducing motion pops and frozen frames by 68% and 81%, respectively. On our benchmark dataset, we further show that cleanup models trained with our method on unpaired corrupted data outperform state-of-the-art methods trained on clean or paired data, while also achieving comparable performance in preserving the content of the original motion clips.

The Method

Inspired by state-return trajectory modeling in offline RL, we incorporate a frame-level quality indicator variable (QualVar) similar to state-level reward in RL. Our StableMotion framework utilizes a generate-discriminate approach, where a model is jointly trained to evaluate motion quality and generate motion of varying quality, according to QualVar.

Similar to the practice of prompting text-to-image model with “photo-realistic quality”, QualVar offers a knob to specify the generation quality. This allows the model to cleanup raw mocap data by first identifying corrupted segments and then inpainting them with high-quality motions.

StableMotion-SoccerMocap

We apply StableMotion to train a motion cleanup model on SoccerMocap, a 245-hour raw motion capture dataset captured by a prominent game studio. The resulting model, StableMotion-SoccerMocap, effectively fix motion artifacts in newly captured motions from the same mocap system.

StableMotion-BrokenAMASS

We benchmark the performance of StableMotion on a publicly available dataset, AMASS, by introducing synthetic corruption to construct a benchmark dataset, BrokenAMASS. This demonstrates that, with StableMotion, motion cleanup models can be effectively trained even on datasets exhibiting severe motion corruptions.

Test-Time Techniques

We also propose a number of test-time techniques that harness the sample diversity and the dual functionality of the generate-discriminate model to improve consistency and preservation of the content in the original motion clips.

Adaptive cleanup uses soft motion quality evaluation and soft motion inpainting. It allows the model to adaptively adjust the modification to each frame based on the severity of the artifacts, enabling it to better preserve the content of the original frames.

Quality-aware ensemble leverages the diversity of diffusion models and the dual functionality of generate-discriminate models. It ensembles diverse candidate motions by selecting the highest-quality motion based on predicted motion quality scores, leading to more consistent and higher-quality results than performing a single pass of the cleanup model.

BibTeX

@inproceedings{
        mu2025StableMotion,
        author={Mu, Yuxuan and Ling, Hung Yu and Shi, Yi and Baira Ojeda, Ismael and Xi, Pengcheng and Shu, Chang and Zinno, Fabio and Peng, Xue Bin},
        title = {StableMotion: Training Motion Cleanup Models with Unpaired Corrupted Data},
        year = {2025},
        booktitle = {SIGGRAPH Asia 2025 Conference Papers (SIGGRAPH Asia '25 Conference Papers)}
}