Generative Human Motion Stylization in Latent Space

ICLR 2024
1University of Alberta  2Noah's Ark Lab, Huawei Canada

Human motion stylization aims to revise the style of an input motion while keeping its content unaltered. Unlike existing works that operate directly in pose space, we leverage the latent space of pretrained autoencoders as a more expressive and robust representation for motion extraction and infusion. Building upon this, we present a novel generative model that produces diverse stylization results of a single motion (latent) code. During training, a motion code is decomposed into two coding components: a deterministic content code, and a probabilistic style code adhering to a prior distribution; then a generator massages the random combination of content and style codes to reconstruct the corresponding motion codes. Our approach is versatile, allowing the learning of probabilistic style space from either style labeled or unlabeled motions, providing notable flexibility in stylization as well. In inference, users can opt to stylize a motion using style cues from a reference motion or a label. Even in the absence of explicit style input, our model facilitates novel re-stylization by sampling from the unconditional style prior distribution. Experimental results show that our proposed stylization models, despite their lightweight design, outperform the state-of-the-arts in style reeanactment, content preservation, and generalization across various applications and settings.


Approach Overview


Label-based Stylization Gallery

Left: Content motion   Right: Stylized motion

Label-based Stylization (Diverse)


Motion-based Stylization


Prior-based Stylization


A global probabilistic style space, confined by a prior Gaussian distribution, is established through our learning scheme. Our work can then randomly sample styles from the prior distribution to achieve stochastic stylization.


Probabilistic Style Space


We highlight the features of our probabilistic style space by showcasing its diverse stylization capacity and style interpolation ability.


Application: Text2Motion Stylization


We showcase the generalization ability of our method to stylize the OOD motions generated from an off-the-shelf T2M model.

Content Motion Generation Works 🚀🚀


MoMask: Swift text-driven motion generation through mask generative modeling.
TM2D: Learning dance generation with textual instruction.
Action2Motion: Diverse action-conditioned motion generation.

BibTeX

@inproceedings{
      guo2024generative,
      title={Generative Human Motion Stylization in Latent Space},
      author={Chuan Guo, Yuxuan Mu, Xinxin Zuo, Peng Dai, Youliang Yan, Juwei Lu, Li Cheng},
      booktitle={The Twelfth International Conference on Learning Representations},
      year={2024},
      url={https://openreview.net/forum?id=daEqXJ0yZo}
}