H-GAP: Humanoid Control with
a Generalist Planner

Zhengyao Jiang*
Yingchen Xu*
(UCL, FAIR at Meta)
Nolan Wagener
(Georgia Tech)
Yicheng Luo
Michael Janner
(UC Berkeley)
Edward Grefenstette
Tim Rocktäschel
Yuandong Tian
(FAIR at Meta)
:sparkles: ICLR 2024 Spotlight :sparkles:
[Paper]            [Code]

We present Humanoid Generalist Autoencoding Planner (H-GAP), a state-action trajectory generative model trained on humanoid trajectories derived from human motion-captured data, capable of adeptly handling downstream control tasks with Model Predictive Control (MPC). For 56 degrees of freedom humanoid, we empirically demonstrate that H-GAP learns to represent and generate a wide range of motor behaviours. Further, without any learning from online interactions, it can also flexibly transfer these behaviors to solve novel downstream control tasks via planning. Notably, H-GAP excels established MPC baselines that have access to the ground truth dynamics model, and is superior or comparable to offline RL methods trained for individual tasks. Finally, we do a series of empirical studies on the scaling properties of H-GAP, showing the potential for performance gains via additional data but not computing.

H-GAP Overview


Left: A VQ-VAE that discretizes continuous state-action trajectories.

Middle: A Transformer that autoregressively models the prior distribution over latent codes, conditioned on the initial state.

Right: Zero-shot adapation to novel tasks via MPC planning with learned Prior Transformer, underscoring H-GAP’s versatility as a generalist model.

Imitation Learning

We train H-GAP on MoCapAct dataset, which contains over 500k rollouts displaying various motion from the CMU MoCap dataset. Starting from the same state, H-GAP with greedy decoding can recover the various behaviours from the reference clips. Note that action noise is added to the final output of H-GAP, so the imitation can’t be achieved by just memorisation.

Long Jumping

Jumping Jack
Cart Wheeling

The reference snippets are short, but H-GAP with greedy decoding can continue the behaviours after reference snippets, sometimes forming a closed loop.

Raise hand

Downstream Control

To test H-GAP’s zero-shot control performance as a generalist model, we design a suite of six control tasks: speed, forward, backward, shift left, rotate and jump. H-GAP matches or beats offline RL methods trained individually for each task. It also outperforms MPC with access to true dynamics, showing benefits of learned action space.


H-GAP with MPC planning can achieve sensible performance on a wide range of downstream tasks in a zero-shot fashion. Starting from an initial state that is irrelevant or contradictory to the objective, the agent will have to figure out a proper transition between motor skills. For example, it may start with a forward motion when the task is to move backwards.





Shift Left