Why We Built This

Humanoid robotics requires a large amount of simulation, iteration, and validation before any policy should be considered for real hardware. Local workstations can become expensive and hard to scale, so we used a cloud GPU setup to make high-performance simulation available without committing to a dedicated desktop machine.

The first objective was intentionally narrow: prove that we can run a modern humanoid environment, start reinforcement learning, and capture visual evidence of training progress. That gives the team a working baseline before adding more complex tasks such as rough terrain, manipulation, perception, and language-conditioned behavior.

What We Set Up

Remote GPU Workstation

A cloud GPU instance acts as the simulation and training machine, giving us access to accelerated physics, rendering, and learning workloads.

Isaac Sim Runtime

Isaac Sim provides the robotics simulation environment, including physics, robot assets, scene rendering, and the underlying Omniverse runtime.

Isaac Lab Training Layer

Isaac Lab provides the robot learning framework used to run reinforcement learning environments, collect metrics, and produce trained policies.

Key Design Choices

Start with a known humanoid model

We chose a built-in Unitree G1 environment rather than starting with a custom robot. This reduces early uncertainty and lets us validate the simulation, learning, and review workflow before introducing custom mechanical design decisions.

Use headless training for performance

The main training loop should run without an interactive viewer. This keeps GPU resources focused on simulation and learning instead of real-time rendering.

Use recorded videos for review

Cloud livestreaming can be sensitive to networking and firewall constraints. Recording videos gives the team a reliable way to review behavior, share progress, and compare policy quality across runs.

The important milestone is not visual polish; it is that the pipeline can repeatedly train a humanoid policy, produce metrics, and generate artifacts the team can evaluate.

Pipeline We Are Building Toward

1 Robot Model Begin with an existing humanoid asset, then later evaluate custom robot descriptions and hardware constraints.

2 Simulation Tasks Define locomotion, balance, terrain, manipulation, and recovery tasks inside simulation.

3 Policy Training Train policies using reinforcement learning, then compare runs using reward curves and behavior videos.

4 Evaluation Stress-test trained policies across scenarios, terrain variation, disturbances, and randomized conditions.

5 Deployment Path Prepare the bridge from simulation policies to robot software, hardware testing, and eventually real-world deployment.

Expected Next Steps

Continue training the baseline Unitree G1 locomotion task until it produces a useful checkpoint.
Record videos from trained checkpoints so the team can evaluate qualitative progress over time.
Track metrics across runs, including reward, episode length, velocity tracking error, and termination causes.
Move from flat-ground walking to rough-terrain locomotion once baseline walking improves.
Add domain randomization to make policies less brittle and more relevant to real-world variation.
Evaluate manipulation and whole-body tasks using additional open-source humanoid environments and Unitree task examples.
Define the eventual robot software interface, likely using ROS 2 for integration with sensors, control, and hardware systems.
Explore higher-level humanoid behavior models, including NVIDIA GR00T-style workflows, after the low-level simulation and training pipeline is reliable.

Why This Matters

This setup gives the team a repeatable way to learn, test, and review humanoid robot behavior before taking on the cost and risk of physical hardware. It also creates a foundation for disciplined experimentation: every change to the robot model, training task, reward function, or environment can be evaluated through metrics and videos.

The current state is an early but meaningful milestone: the training infrastructure works, a humanoid task is running, and we have a way to inspect the results. The next phase is to turn this into a managed experimentation pipeline with saved checkpoints, comparable runs, richer tasks, and a clear path toward sim-to-real validation.

Nvidia Isac Tutorial

End-to-End Cloud Tutorial: Running Isaac Sim and Isaac Lab from a Mac

Goal: use a Mac only as the browser/control machine while NVIDIA Brev runs Isaac Sim, Isaac Lab, training jobs, checkpoints, and livestreamed simulation on a remote NVIDIA GPU instance.

Recommended setup: NVIDIA Brev → Isaac Launchable by sreetz → browser-based VS Code → Isaac Sim viewer at /viewer → Isaac Lab training and playback commands from the cloud terminal.

1. Choose the right cloud launchable

In NVIDIA Brev Launchables, select Isaac Launchable. This option is designed to provide Isaac Sim and Isaac Lab in a browser-based workflow, with one tab for VS Code and one tab for the streamed Isaac Sim user interface.

Use: Isaac Launchable for the general Isaac Sim + Isaac Lab workflow.
Avoid for the first run: GR00T post-training launchables, teleoperation-only demos, or unknown demo launchables unless you specifically need those workflows.
Expected cost: roughly $3/hr for the tested launchable, so stop the instance when idle.
GPU requirement: use an NVIDIA RTX/L40S-style GPU with RT cores. Do not use CPU-only, Mac-only, or unsupported non-RTX GPU environments.

2. Deploy the instance

Click Deploy Now or Deploy Launchable.
Wait until Brev reports that the instance is running, built, and setup has completed.
Open the Brev instance page.
Find the Using Secure Links section.
Open the shareable URL and log in with your NVIDIA Brev account.
You should land inside a browser-based VS Code environment.

3. Understand the two browser tabs

VS Code tab: used for terminals, commands, files, logs, checkpoints, and training.
Viewer tab: used to see the streamed Isaac Sim UI.
The viewer URL is the same Brev URL as VS Code, but with /viewer at the end.

# Example VS Code URL:
https://isaac-pupyzgohq.brevlab.com/?folder=/workspace

# Matching viewer URL:
https://isaac-pupyzgohq.brevlab.com/viewer

Important: keep only one /viewer tab open. The launchable is intended for a single viewer session at a time.

4. Accept the Isaac Sim license

The first time Isaac Sim runs, it may refuse to start until the NVIDIA Isaac Sim Additional Software and Materials License is accepted through an environment variable.

export ACCEPT_EULA=Y

To make this persist for future terminals in the same environment, add it to ~/.bashrc.

echo 'export ACCEPT_EULA=Y' >> ~/.bashrc
source ~/.bashrc

5. Start plain Isaac Sim and verify streaming

Use this only when you want to open the Isaac Sim UI by itself. This is not required during headless training.

/isaac-sim/runheadless.sh

Wait until the terminal prints a line similar to:

[18.942s] app ready

Then open or refresh the viewer tab:

https://isaac-pupyzgohq.brevlab.com/viewer

If Isaac Sim is already running from another terminal, do not start another copy. Only one Isaac Sim or Isaac Lab streaming process should run at a time.

6. Stop plain Isaac Sim before policy playback

When moving from plain Isaac Sim to an Isaac Lab playback command, stop the existing /isaac-sim/runheadless.sh process first.

# In the terminal running /isaac-sim/runheadless.sh:
Ctrl+C

For policy playback, /isaac-sim/runheadless.sh should not be running anywhere. The play.py command launches its own Isaac Sim session with livestreaming enabled.

7. Run a quick training smoke test with Isaac Ant

Before training a humanoid, run the lightweight Ant task. Isaac Ant is a simple four-legged reinforcement-learning benchmark robot with eight actuated joints. It trains quickly and proves the cloud pipeline works.

export ACCEPT_EULA=Y
python isaaclab/scripts/reinforcement_learning/skrl/train.py --task=Isaac-Ant-v0 --headless

A successful run shows environment setup messages, SKRL/PPO logging, and a progress bar similar to:

[INFO]: Completed setting up the environment...
[skrl:INFO] Environment wrapper: Isaac Lab (single-agent)
100%|████████████████████████| 36000/36000 [...]

8. Confirm checkpoints were created

After training, check that logs and model checkpoints were saved under /workspace/logs.

find /workspace -type d -name "logs" 2>/dev/null
find /workspace -type f -name "*.pt" 2>/dev/null | head

For the Ant smoke test, a successful result should look similar to:

/workspace/logs/skrl/ant/<timestamp>_ppo_torch/checkpoints/best_agent.pt
/workspace/logs/skrl/ant/<timestamp>_ppo_torch/checkpoints/agent_800.pt
/workspace/logs/skrl/ant/<timestamp>_ppo_torch/checkpoints/agent_1600.pt

9. Play back the trained Ant policy

Stop any existing Isaac Sim process first, then run the Ant playback command with livestreaming.

export ACCEPT_EULA=Y
python isaaclab/scripts/reinforcement_learning/skrl/play.py --task=Isaac-Ant-v0 --livestream 2 \
  --checkpoint /workspace/logs/skrl/ant/<timestamp>_ppo_torch/checkpoints/best_agent.pt

After the terminal prints app ready or Simulation App Startup Complete, refresh the viewer tab at /viewer.

10. Find the Unitree G1 humanoid tasks

Once the Ant test works, list the available G1 environments. The tested launchable included both flat-ground and rough-terrain Unitree G1 locomotion tasks.

python isaaclab/scripts/environments/list_envs.py --keyword G1

Relevant task IDs from the tested environment:

Isaac-Velocity-Flat-G1-v0 — train G1 flat-ground velocity-tracking locomotion.
Isaac-Velocity-Flat-G1-Play-v0 — play/view the flat-ground G1 policy.
Isaac-Velocity-Rough-G1-v0 — train G1 rough-terrain locomotion.
Isaac-Velocity-Rough-G1-Play-v0 — play/view the rough-terrain G1 policy.

Start with Isaac-Velocity-Flat-G1-v0. Flat-ground walking is the clean baseline before moving to rough terrain.

11. Train the Unitree G1 flat-ground locomotion task

Run the G1 training task headlessly. Headless training keeps GPU resources focused on simulation and reinforcement learning instead of rendering.

export ACCEPT_EULA=Y
python isaaclab/scripts/reinforcement_learning/skrl/train.py --task=Isaac-Velocity-Flat-G1-v0 --headless

This trains a policy for Unitree G1 humanoid flat-ground velocity tracking: the simulated humanoid learns to follow movement commands on flat terrain.

12. Keep training alive after closing the browser

Closing the browser is okay only if the training process keeps running on the cloud machine. Use tmux so the job survives browser disconnects.

tmux new -s g1-training

export ACCEPT_EULA=Y
python isaaclab/scripts/reinforcement_learning/skrl/train.py --task=Isaac-Velocity-Flat-G1-v0 --headless

Detach from tmux before closing the browser:

Ctrl+B
D

Later, reconnect to the training session:

tmux attach -t g1-training

Do not stop the Brev instance while training. Closing the browser is fine; stopping the instance stops the job.

13. Find the G1 checkpoint

After G1 training completes, locate the best checkpoint.

find /workspace/logs -type f -name "best_agent.pt" | grep -i g1

If the folder names do not contain g1, inspect the latest SKRL log folders:

find /workspace/logs/skrl -maxdepth 3 -type d | sort | tail -50
find /workspace/logs -type f -name "best_agent.pt" | sort | tail -10

14. Play back the trained G1 policy

Stop training and any other Isaac Sim process first, then run the G1 playback task with livestreaming and the trained checkpoint.

export ACCEPT_EULA=Y
G1_CKPT=$(find /workspace/logs -type f -name "best_agent.pt" | grep -i g1 | tail -n 1)

python isaaclab/scripts/reinforcement_learning/skrl/play.py \
  --task=Isaac-Velocity-Flat-G1-Play-v0 \
  --livestream 2 \
  --checkpoint "$G1_CKPT"

Wait for the terminal to report that the app is ready, then refresh the viewer:

https://isaac-pupyzgohq.brevlab.com/viewer

15. View G1 without training

To see the G1 task without actively training, run the play task. This launches the simulation in viewer mode. Use a checkpoint if you want to see a trained policy; omit the checkpoint only for basic environment visualization.

export ACCEPT_EULA=Y
python isaaclab/scripts/reinforcement_learning/skrl/play.py --task=Isaac-Velocity-Flat-G1-Play-v0 --livestream 2

16. Try different policies, libraries, and experiments

In this workflow, “trying different policies” can mean changing the RL backend, the task, or the hyperparameters.

First, inspect which RL training backends are installed:

ls isaaclab/scripts/reinforcement_learning

Then compare runs. Example experiment matrix:

Baseline: skrl + Isaac-Velocity-Flat-G1-v0 + seed 42.
Seed comparison: same task and backend with seeds 1, 2, and 3.
Backend comparison: try rsl_rl or rl_games if installed.
Task comparison: move from Flat-G1 to Rough-G1 after the flat baseline works.

# Same task, different seeds
python isaaclab/scripts/reinforcement_learning/skrl/train.py --task=Isaac-Velocity-Flat-G1-v0 --headless --seed 1
python isaaclab/scripts/reinforcement_learning/skrl/train.py --task=Isaac-Velocity-Flat-G1-v0 --headless --seed 2
python isaaclab/scripts/reinforcement_learning/skrl/train.py --task=Isaac-Velocity-Flat-G1-v0 --headless --seed 3

# Different backend, if installed
python isaaclab/scripts/reinforcement_learning/rsl_rl/train.py --task=Isaac-Velocity-Flat-G1-v0 --headless

# Harder terrain task
python isaaclab/scripts/reinforcement_learning/skrl/train.py --task=Isaac-Velocity-Rough-G1-v0 --headless

17. Useful inspection commands

# Show available environments
python isaaclab/scripts/environments/list_envs.py

# Search for humanoid-related environments
python isaaclab/scripts/environments/list_envs.py --keyword G1
python isaaclab/scripts/environments/list_envs.py --keyword H1
python isaaclab/scripts/environments/list_envs.py --keyword Humanoid
python isaaclab/scripts/environments/list_envs.py --keyword Unitree

# Find checkpoints
find /workspace/logs -type f -name "*.pt" | sort | tail -20

# Find best checkpoints
find /workspace/logs -type f -name "best_agent.pt" | sort

# Check GPU
nvidia-smi

18. Troubleshooting notes from the working session

docker: command not found: this can happen inside the VS Code container. It is not fatal. Continue with the Isaac commands.
EULA error: run export ACCEPT_EULA=Y.
Viewer does not connect: make sure the URL ends in /viewer, not ?folder=/workspace.
Nothing streams: stop duplicate Isaac Sim processes. Only one training/playback/streaming process should run at a time.
Warnings about deprecated extensions: these are usually normal and can be ignored during setup validation.
GLFW initialization failed in cloud/headless mode: this can appear during livestream startup and is not necessarily fatal if the app continues and the viewer works.
Failed to open /var/run/utmp: this is usually not the blocking issue. Wait for app ready.

19. Cost control checklist

Use headless training whenever possible.
Use the viewer only for inspection and playback.
Close extra viewer tabs.
Checkpoint frequently before long experiments.
Stop the Brev instance when idle.
Remember: closing the browser does not stop billing; stopping the instance does.

20. Proven pipeline status

This workflow successfully demonstrated the full cloud robotics loop: launch Isaac Sim in the browser, train Isaac Ant, save checkpoints, play back a trained policy, list G1 environments, train Unitree G1 flat-ground locomotion, and view the G1 policy through the streamed Isaac Sim renderer.

Milestone achieved: Mac → NVIDIA Brev → Isaac Launchable → Isaac Lab training → checkpoint artifacts → Isaac Sim livestream playback.

Why We Built This

What We Set Up

Remote GPU Workstation

Isaac Sim Runtime

Isaac Lab Training Layer

What We Accomplished

Key Design Choices

Start with a known humanoid model

Use headless training for performance

Use recorded videos for review

Pipeline We Are Building Toward

Expected Next Steps

Why This Matters

Nvidia Isac Tutorial

End-to-End Cloud Tutorial: Running Isaac Sim and Isaac Lab from a Mac

1. Choose the right cloud launchable

2. Deploy the instance

3. Understand the two browser tabs

4. Accept the Isaac Sim license

5. Start plain Isaac Sim and verify streaming

6. Stop plain Isaac Sim before policy playback

7. Run a quick training smoke test with Isaac Ant

8. Confirm checkpoints were created

9. Play back the trained Ant policy

10. Find the Unitree G1 humanoid tasks

11. Train the Unitree G1 flat-ground locomotion task

12. Keep training alive after closing the browser

13. Find the G1 checkpoint

14. Play back the trained G1 policy

15. View G1 without training

16. Try different policies, libraries, and experiments

17. Useful inspection commands

18. Troubleshooting notes from the working session

19. Cost control checklist

20. Proven pipeline status