In this tutorial you will find the steps to create a complete working environment for Reinforcement Learning (RL) and how to run your first training and demo.
The training and demo environment includes:
- Multi-Joint dynamics with Contact (MuJoCo): a physics engine that can be used for robotics, biomechanics and machine learning;
- OpenAI Gymnasium: the open source Python library for developing and comparing reinforcement learning algorithms;
- Stable Baselines3 (SB3): a set of implementations of reinforcement learning algorithms in PyTorch;
- PyTorch: the open-source deep learning library;
- TensorBoard: for viewing the RL training;
- Conda: the open-source and cross-platform package manager and environment management system;
Why Conda?
Using a package management system like Conda in RL has become an industry standard.
The reasons are well-founded and relate to:
- environment isolation (system agnosticism): using Conda, you can “freeze” a training environment to a preferred Python version. If one project requires Python 3.8 and another Python 3.10, you can run them simultaneously on the same laptop without any interference.
- reproducibility of training: it is vital that you or someone else can run the code exactly the same way, regardless of whether you share the project or change your computer.
- total control over CUDA versions, which is essential for the GPU: with Conda, you can install a different CUDA “toolkit” for each training and demo environment. You can have one project with PyTorch 2.1 and CUDA 11.8 and another with an older version, without them clashing.
- management of non-Python dependencies: MuJoCo relies on code written in C and C++ and requires specific system libraries. Conda installs these pre-compiled binaries, saving you hours of errors like “missing C++ compiler” or “DLL load failed.”
In short, for all my tutorials and projects, I use Conda. You can download it from https://www.anaconda.com/download
Installation Order and Logic Behind
For a stable environment that supports both CPU and GPU (via NVIDIA CUDA), the recommended order is:
- Environment Creation (Python): we use Conda to create a clean environment and avoid version conflicts.
- PyTorch (with CUDA support).
- MuJoCo: since version 2.1.0, MuJoCo is open-source and much easier to install (you no longer need .txt license files).
- Gymnasium: the standard interface for RL environments.
- Stable Baselines3 (SB3) & TensorBoard: RL algorithms and visualization tool.
Why this order of installation?
If you install SB3 before PyTorch is configured for GPU, the system will use only the processor (CPU). By manually installing PyTorch first, SB3 will automatically “see” the graphics card and use it when it is available, falling back to the CPU when it is not.
Check CUDA version
Open a Command Prompt and run the command:
nvidia-smi
You should see something like this:

Info: This version of CUDA [13.0] tells me that it can run applications compiled up to CUDA 13.
Step-by-Step Installation Guide (Optimized for CUDA 13/Windows 11)
Open an Anaconda Prompt and run the following commands in the same order as bellow.
1. Creating the environment
We will use Python 3.10 because it is the solution for compatibility between MuJoCo and Stable Baselines3.
conda create --name rl_test python=3.10 -y
After running that command, at the end you should see something like this:

To activate the environment, run the following command:
conda activate rl_test
After running the command you should see something like this:

2. Installing PyTorch (GPU Engine)
Although I have CUDA 13 on my laptop, I will install the version of PyTorch compiled for the latest stable toolkits available in Conda (usually 12.1 or 11.8). Conda will handle the compatibility itself.
Run the command:
conda install pytorch torchvision torchaudio pytorch-cuda=12.1 -c pytorch -c nvidia
During installation, you will see something like this:

Note: This command tells your environment to use the graphic card for all complex mathematical calculations in RL.
3. Installing MuJoCo
Installing MuJoCo is extremely simple and no longer requires manual environment variable settings in most cases on Windows 11.
Run the command:
pip install mujoco

If it appears, what does the “dependency conflict” error mean?
“ERROR: pip’s dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts. torch 2.5.1 requires sympy==1.13.1, but you have sympy 1.14.0 which is incompatible.“
The Torch 2.5.1 package is very strict and requires Sympy version 1.13.1 (a library for symbolic math calculations). However, in my environment Sympy 1.14.0 was automatically installed (probably as a dependency for something else).
To fix the dependency conflict, run the command:
pip install sympy==1.13.1
4. Installing Gymnasium (Interface)
The following command will install Gymnasium with all the dependencies needed to run the physics simulations in MuJoCo.
pip install "gymnasium[mujoco]"
At the end of the installation, we should see a message telling us that the installation was successful:
![gymnasium[mujoco] successfully installed](https://www.reinforcementlearningpath.com/wp-content/uploads/2026/03/image-5-1024x28.png)
5. Installing Stable Baselines3 (SB3) and TensorBoard
The last step is to install the “brain” (RL algorithms) and the “eyes” (progress visualization).
For this last step we run the command:
pip install stable-baselines3[extra] tensorboard
At the end of the installation, we should see a message like this:

How do you check if everything is working correctly?
It is crucial to check if PyTorch “sees” the graphics card (via CUDA) and if MuJoCo can render a window.
Run this quick script in the terminal:
python -c "import torch; print('1. CUDA unavailable:', torch.cuda.is_available()); print('2. GPU Name:', torch.cuda.get_device_name(0) if torch.cuda.is_available() else 'N/A'); import mujoco; print('3. MuJoCo successfully installed!'); import gymnasium as gym; env = gym.make('Ant-v5', render_mode='human'); print('4. MuJoCo env was created!'); env.close()"
The result of running the command should be like this:

Mujoco Viewer
In the simulation, MuJoCo’s role is to apply the laws of physics. It calculates how objects fall, how a robot’s knees bend, or how hard a motor needs to push to lift a weight.
MuJoCo Viewer is a window through which we can look at what is happening in the simulation. Without it, everything would just be a boring list of numbers in a table. With it, we can see the robot moving and how well it moves.
With MuJoCo Viewer we can:
- see mistakes: if the robot falls, in the Viewer we can immediately see if it is because its legs are too thin or its center of gravity is wrong.
- interact with the robot: we can use the mouse to act on the robot while it is working.
- see invisible forces: the viewer can draw arrows that show where the pressure is higher or how the sensors work.
How to use MuJoCo Viewer
The easiest way to see the Viewer without writing a single line of code is to run the test command:
python -m mujoco.viewer
Info: Don’t forget to activate the Conda environment
This is how MuJoCo Viewer looks like:

MuJoCo Menagerie
MuJoCo Menagerie is like a store where we find real-world robots, ready to use in our projects.
Its goal is to save users the time of designing and importing robots into the simulator.
Another benefit is standardization. If all users in the world use the same “dog-robot” model from Menagerie, everyone’s results can be compared much more easily.
The robots in Menagerie are faithful copies of real ones (such as the Universal Robots robotic arm or the Unitree robot).
The list of robots and how we can use them can be found here: https://github.com/google-deepmind/mujoco_menagerie

Running the first training and demo with MuJoCo, Gymnasium and SB3
Above we did the hard part of setting up the suite of frameworks and libraries used to train an RL agent. Now that the environment is functional and PyTorch sees the graphic card and MuJoCo renders correctly, we take the first step to train an RL agent.
We will use the Ant-v5 environment (a four-legged robot that needs to learn to walk) and the PPO (Proximal Policy Optimization) algorithm, which is the standard for stability in robotics and RL.
Step 1: Creating the Training Script
We will use a Python script to start the training. Open an editor and save the following code as train_ant.py in your project folder.
We are only running 100,000 steps. This is enough for a demo.
"""
PPO Training for Ant-v5
-----------------------------------------------
Train a PPO agent using Stable Baselines3.
TRAINING EXAMPLES:
python train_ant.py
Author: Calin Dragos George
Updated: March 2026
"""
import gymnasium as gym
from stable_baselines3 import PPO
import os
# 1. Create the environment withotu render_mode='human' to speed up the training
env = gym.make('Ant-v5', render_mode=None)
# 2. For logs (TensorBoard)
logdir = "logs"
if not os.path.exists(logdir):
os.makedirs(logdir)
# 3. Initializaing the PPO model
#We use 'MlpPolicy' for vector data (positions, velocities)
model = PPO("MlpPolicy", env, verbose=1, tensorboard_log=logdir, device="cuda")
# 4. Training the agent
print("Start training... check TensorBoard for progress!")
model.learn(total_timesteps=100000, tb_log_name="PPO_Ant_First_Run")
# 5. Save the model
model.save("ppo_ant_model")
print("Model saved successfully!")
env.close()
Step 2: Activate the environment and run the script
Open an Anaconda session and activate the Conda environment created above:
conda activate rl_test
Navigate to the directory where you save the training script, and run the bellow command:
python train_ant.py
After a while, you should see something like this:

Step 3: Real-time monitoring with TensorBoard
While the agent is learning, we will monitor the learning using performance graphs.
- Open a second Anaconda Prompt terminal.
- Activate the environment: conda activate rl_test.
- Navigate to the project folder and run the command: tensorboard –logdir=logs
After you run the command, you should see something like this:

Copy the address http://localhost:6006/ into a browser and press Enter.
Initially, the graphics panel looks like this:

Watch rollout/ep_rew_mean: If the line goes up, it means your robot ant is learning to walk and getting bigger rewards!
Step 4: Testing the agent
After the short training is over, we want to see what the agent has learned. Create a new file called ant_demo.py and copy the below code:
"""
PPO Demo for Ant-v5
-----------------------------------------------
Demo the PPO agent for Ant Robot.
TRAINING EXAMPLES:
python ant_demo.py
Author: Calin Dragos George
Updated: March 2026
"""
import gymnasium as gym
from stable_baselines3 import PPO
# Load the environment with visual
env = gym.make('Ant-v5', render_mode='human')
# Load the model
model = PPO.load("ppo_ant_model")
obs, info = env.reset()
for _ in range(1000):
# The model decides the next move based on the observation
action, _states = model.predict(obs, deterministic=True)
obs, reward, terminated, truncated, info = env.step(action)
env.render()
if terminated or truncated:
obs, info = env.reset()
env.close()
The demo looks like this:

What’s next?
Now that we have a fully functional working environment, we are ready to move on to the interesting part, namely training our own intelligent agents.
In the next series of tutorials, we will explore the world of robotics step by step, learning how to apply Reinforcement Learning for various platforms such as:
- Robotic arms: we will learn how to control industrial arms like the UR5e for precision tasks, from simple movements to object manipulation.
- Drones: we will learn how to train an agent to stabilize and navigate a drone in complex environments.
- Legged robots and humanoids: we move to the advanced level, where we will simulate balance and locomotion for quadrupedal robots (like Unitree) and complex humanoid structures.
- Autonomous mobile robots: we will see how wheeled robots can learn to avoid obstacles and find the optimal path in an unknown space.
Each tutorial will include theoretical explanations, source code, and video demonstrations of the agent’s progress.



Thank you for doing this. This is a great intro tutorial. All of the steps required are provided and actually work!
I can’t wait for the next installments.