Robotics Simulation Initiative

Humanoid Robot Training Pipeline with NVIDIA Isaac Sim and Isaac Lab

We established a remote GPU workflow for training and reviewing humanoid robot behavior in simulation. The result is a practical foundation for moving from early locomotion experiments toward a repeatable robot learning pipeline.

Why We Built This

Humanoid robotics requires extensive simulation, iteration, and validation before a learned policy should be considered for physical hardware. A remote GPU setup gives the team access to accelerated simulation and training without depending on a dedicated local workstation.

The first milestone was intentionally focused: prove that we can run a modern humanoid environment, start reinforcement learning, collect training metrics, and produce visual artifacts that the team can review.

What We Set Up

Remote GPU workstation A cloud GPU instance acts as the simulation and training machine for accelerated physics, rendering, and learning workloads. Isaac Sim runtime Isaac Sim provides the robotics simulation environment, including physics, robot assets, scene rendering, and the Omniverse runtime.
Isaac Lab training layer Isaac Lab provides robot learning environments, training scripts, metrics, and checkpoint management. Unitree G1 baseline A known humanoid model gives us a stable starting point before introducing custom robot assets or hardware-specific constraints.

What We Accomplished

Key Design Choices

Start with a known humanoid model

Beginning with a built-in humanoid environment reduces uncertainty. It allows us to validate the training system before spending time on custom robot modeling, asset conversion, or hardware-specific integration.

Train headlessly for performance

The main training loop should prioritize simulation throughput and policy learning. Interactive rendering is useful for inspection, but it should not be part of the primary training workload.

Use recorded videos for review

Recorded videos are easier to share, compare, and archive than remote livestream sessions. They also provide a lightweight way for non-operators to assess whether behavior is improving.

The important milestone is not visual polish. The important milestone is that the team can repeatedly train a humanoid policy, collect metrics, save checkpoints, and review behavior.

Pipeline We Are Building Toward

Stage Purpose Expected Output
Robot model Represent the humanoid body, joints, limits, mass properties, and sensors. A simulation-ready robot asset.
Simulation tasks Define locomotion, balance, terrain, manipulation, recovery, and task goals. Repeatable training environments.
Policy training Use reinforcement learning or imitation learning to learn robot behaviors. Checkpoints, reward curves, and candidate policies.
Evaluation Stress-test policies across scenarios, disturbances, randomized conditions, and terrain variation. Evidence that a policy is robust or needs more work.
Deployment path Prepare the bridge from simulation policies to robot software and hardware testing. A controlled sim-to-real workflow.

Expected Next Steps

  1. Continue training the baseline Unitree G1 locomotion task until it produces a useful checkpoint.
  2. Record videos from trained checkpoints so the team can evaluate qualitative progress over time.
  3. Track metrics across runs, including reward, episode length, velocity tracking error, and termination causes.
  4. Move from flat-ground walking to rough-terrain locomotion after baseline walking improves.
  5. Add domain randomization so policies are less brittle and more relevant to real-world variation.
  6. Evaluate manipulation and whole-body tasks using additional humanoid environments and task examples.
  7. Define the eventual robot software interface for sensors, control, and hardware systems.
  8. Explore higher-level humanoid behavior models after the low-level training pipeline is reliable.

Why This Matters

This setup gives the team a repeatable way to learn, test, and review humanoid behavior before taking on the cost and risk of physical hardware. Each change to the robot model, reward function, training task, or environment can be evaluated through metrics and recorded behavior.

The current state is an early but meaningful milestone: the training infrastructure works, a humanoid task is running, and the team can inspect results. The next phase is to turn this into a managed experimentation pipeline with saved checkpoints, comparable runs, richer tasks, and a clear path toward sim-to-real validation.