AI Robotics: Tutorials, Practical Reinforcement Learning, and Real-World Control
  • RL Fundamentals
    • Learn to train intelligent agents that actually converge
      • RL FOUNDATION
        • Types of Reinforcement Learning
        • 1 Mathematical Foundations
          • 1.1 Vectors
          • 1.2 Derivatives
          • 1.3 Gradients
          • 1.4 Spaces
          • 1.5 Normalization
          • 1.6 Function Approximation
        • 2 Core RL Concepts
          • 2.1 Problem Classification
          • 2.2 Bellman Equation
          • 2.3 Model Free Learning
          • 2.4 Reward Shaping
          • 2.5 On-Policy vs Off-Policy Learning
          • 2.6 Agent
          • 2.7 Markov Decision Process(MDP)
        • 3 Learning Strategies
          • 3.1 Choosing RL Algorithm
          • 3.2 Epsilon-greedy
          • 3.3 SIM2REAL
          • 3.4 Experience Replay
          • 3.5 Curriculum Learning
          • 3.6 Isaac Sim
        • 4 Deep RL Techniques
          • 4.1 Backpropagation
          • 4.2 Weight Initialization
          • 4.3 Gradient Descent
          • 4.4 ReLU Activation Function
          • 4.5 Artificial Neuron
          • 4.6 Adam Optimization
          • 4.7 Convolutional Neural Network
        • 5 RL Algorithms
          • Q-Learning
          • Deep Q Network (DQN) – Formula and Explanation
          • Double DQN
          • Dueling DQN
          • Proximal Policy Optimization (PPO)
          • Soft Actor-Critic (SAC)
      • CLASSIC DEEP RL APPLICATION
        • PART 1: Deep RL with DQN and CNN
        • PART 2: Problem Definition
        • PART 3: Markov Decision Process (MDP)
        • PART 4: Choosing the Algorithm
        • PART 5: Environment + RL Model + Reward Function
        • PART 6: Training + Testing + Google Colab Access
    • Q-Learning
  • Deep RL Algorithms
    • DQN
    • PPO
    • SAC
  • Simulation & Environments
    • OpenAI Gymnasium
  • Tools, Code & Experiment Design
    • PyTorch
    • Stable-Baselines3
No Result
View All Result
AI Robotics: Tutorials, Practical Reinforcement Learning, and Real-World Control
  • RL Fundamentals
    • Learn to train intelligent agents that actually converge
      • RL FOUNDATION
        • Types of Reinforcement Learning
        • 1 Mathematical Foundations
          • 1.1 Vectors
          • 1.2 Derivatives
          • 1.3 Gradients
          • 1.4 Spaces
          • 1.5 Normalization
          • 1.6 Function Approximation
        • 2 Core RL Concepts
          • 2.1 Problem Classification
          • 2.2 Bellman Equation
          • 2.3 Model Free Learning
          • 2.4 Reward Shaping
          • 2.5 On-Policy vs Off-Policy Learning
          • 2.6 Agent
          • 2.7 Markov Decision Process(MDP)
        • 3 Learning Strategies
          • 3.1 Choosing RL Algorithm
          • 3.2 Epsilon-greedy
          • 3.3 SIM2REAL
          • 3.4 Experience Replay
          • 3.5 Curriculum Learning
          • 3.6 Isaac Sim
        • 4 Deep RL Techniques
          • 4.1 Backpropagation
          • 4.2 Weight Initialization
          • 4.3 Gradient Descent
          • 4.4 ReLU Activation Function
          • 4.5 Artificial Neuron
          • 4.6 Adam Optimization
          • 4.7 Convolutional Neural Network
        • 5 RL Algorithms
          • Q-Learning
          • Deep Q Network (DQN) – Formula and Explanation
          • Double DQN
          • Dueling DQN
          • Proximal Policy Optimization (PPO)
          • Soft Actor-Critic (SAC)
      • CLASSIC DEEP RL APPLICATION
        • PART 1: Deep RL with DQN and CNN
        • PART 2: Problem Definition
        • PART 3: Markov Decision Process (MDP)
        • PART 4: Choosing the Algorithm
        • PART 5: Environment + RL Model + Reward Function
        • PART 6: Training + Testing + Google Colab Access
    • Q-Learning
  • Deep RL Algorithms
    • DQN
    • PPO
    • SAC
  • Simulation & Environments
    • OpenAI Gymnasium
  • Tools, Code & Experiment Design
    • PyTorch
    • Stable-Baselines3
No Result
View All Result
AI Robotics: Tutorials, Practical Reinforcement Learning, and Real-World Control
No Result
View All Result

Tutorial: How to Install Stable-Baselines3 the Right Way (Windows & Linux): PyTorch + Gymnasium

by Dragos Calin
in OpenAI Gymnasium, PyTorch, RL Fundamentals, Simulation & Environments, Stable-Baselines3, Tools, Code & Experiment Design
4
A A
0

In this tutorial, I will guide you step by step to install PyTorch, Stable-Baselines3, and Gymnasium on Windows and Linux. It’s exactly how I did it on my personal laptop where I train RL agents. By the end of this tutorial, you will see how to run the training on CPU and GPU, possible errors, and how to fix them.

Decision Tree

This decision tree is crucial because Stable-Baselines3 and Gymnasium depend on PyTorch
This decision tree is crucial because Stable-Baselines3 and Gymnasium depend on PyTorch

I created a decision tree to guide you in choosing the correct installation combination based on the hardware and software of your machine. Thus, we have:

  • Operating system: Windows or Linux,
  • Type of hardware: CPU, NVIDIA GPU, or AMD GPU,
  • The appropriate version of PyTorch. This is the foundation for Stable-Baselines3 and Gymnasium.

This decision tree is crucial because Stable-Baselines3 and Gymnasium depend on PyTorch. The latter has dozens of different variants depending on the hardware and drivers. If the choice is made incorrectly, hard-to-diagnose errors will occur. Two of the most common errors are:

torch.cuda.is_available() == False, ImportError: DLL load failed 
RuntimeError: CUDA driver version is insufficient

Installation order matters

PyTorch must be installed first, because SB3 does not work without it.

Gymnasium will be installed at the end because some environments (Atari, MuJoCo, Robotics) have large dependencies and should not be installed unnecessarily.

Hardware dictates the correct PyTorch version

  • If you only have a CPU → install the CPU version. It’s the simplest one.
  • If you have an NVIDIA GPU → you must choose the version compatible with CUDA (for example cu121 or cu126).
  • If you have an AMD GPU → choose the ROCm version (for example rocm7.0/7.1).

System Specifications Used in This Tutorial

Below are the exact system configurations on which I performed the installation and testing of PyTorch, Stable-Baselines3, and Gymnasium.

Main Machine (Host): Windows 11 Pro

  • OS: Microsoft Windows 11 Pro
  • Laptop model: Dell Latitude 5521
  • CPU: 11th Gen Intel® Core™ i7-11850H @ 2.50 GHz (8 Cores / 16 Threads)
  • Integrated GPU: Intel® UHD Graphics (1 GB)
  • Dedicated GPU: NVIDIA GeForce MX450 (2 GB GDDR5, Driver 32.0.15.8108)
  • CUDA toolkit: NVIDIA CUDA 13.0
  • System type: x64-based architecture (UEFI BIOS mode)

In this tutorial, I executed all CPU and GPU benchmarks for PyTorch and Stable-Baselines3 on this Windows host machine.

The installed CUDA 13.0 toolkit allows the use of PyTorch builds with GPU acceleration for faster training performance.

Secondary Environment: Ubuntu 22.04 LTS (Virtual Machine)

  • Virtualization platform: Oracle VirtualBox 7.x
  • Guest OS: Ubuntu Jammy 22.04 LTS (64-bit)
  • Allocated resources: 8 vCPUs, 9 GB RAM, 50 GB Disk
  • Graphics controller: VMSVGA (software rendering, no GPU acceleration)

The Linux VM is used exclusively for testing installation procedures and compatibility.
GPU acceleration (CUDA or ROCm) is not available inside the virtual machine.

Why I Use Conda on Both Windows and Linux

The reason is simple and has a great effect on my environments. I use Conda on both systems to keep my environments compatible across platforms.
Conda allows me to create isolated environments with the exact versions of Python, PyTorch, Stable-Baselines3, and Gymnasium I need.

By doing this:

  • I avoid dependency conflicts between pip and apt packages on Linux,
  • I can activate the same environment name (gymenv) on both systems,
  • I can reproduce the same results and code without any modification.

Info: In this tutorial, you’ll find the steps for installing OpenAI Gymnasium on Windows using Conda: How to Install OpenAI Gymnasium in Windows and Launch Your First Python RL Environment.

Installing PyTorch for Windows (GPU + CUDA) and Linux (CPU-only or VM)

A. PyTorch Installation on Windows (NVIDIA GPU + CUDA)

On my Windows 11 Pro laptop, I have an NVIDIA GeForce MX450 GPU and CUDA Toolkit 13.0 installed.
Therefore, I need to install the PyTorch build compatible with CUDA 12.1 (cu121).

Why CUDA 12.1 and not 13?

Even though my laptop has CUDA 13.0 installed, PyTorch 2.x officially supports up to CUDA 12.1 on Windows.
Since CUDA 13 drivers are backward-compatible, using the cu121 build ensures full GPU acceleration and perfect stability.

Step-by-step (Windows):

# Step 1 – Activate the existing environment
conda activate gymenv

# Step 2 – Check Python version
python --version
# should display: Python 3.11.x

# Step 3 – Install PyTorch with GPU support
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121

# Step 4 – Verify installation
python -c "import torch; print('PyTorch version:', torch.__version__); print('CUDA available:', torch.cuda.is_available())"

Below you can see the confirmation from Conda after installing the PyTorch build compatible with CUDA.
This output shows that the GPU-enabled version (cu121) was successfully installed and recognized by the environment.

Verify PyTorch installation on Windows (NVIDIA GPU + CUDA)
Verify PyTorch installation on Windows (NVIDIA GPU + CUDA)

B. PyTorch Installation on Linux (Virtual Machine)

The Ubuntu 22.04 virtual machine has no GPU access (software rendering only). In this case we will install the CPU-only version of PyTorch.

Install Conda on Your Linux Virtual Machine

If Conda is not installed yet on the Linux machine, this should be the first step.

Install Miniconda (recommended)

Miniconda is a lightweight version of Anaconda — it installs faster and uses less space.

Run these commands one by one in your Linux terminal:

# Download the latest Miniconda installer for Linux
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh

# Run the installer
bash Miniconda3-latest-Linux-x86_64.sh

#After it finishes, activate Conda
source ~/.bashrc

#Now you can test that Conda works
conda --version
If you see something like in the image, then everything is perfect — Conda is installed and ready!
If you see something like in the image, then everything is perfect — Conda is installed and ready!

Install PyTorch for CPU

Now it’s time to install PyTorch, the engine that will power your neural networks.
Since we are using a virtual machine without GPU, we’ll install the CPU version of PyTorch.

# Check the Python version. You should see something like:3.13.x
python --version

# Create a new environment for Reinforcement Learning
conda create -n rl python=3.13 -y

#Info: if you have a message about conda tos, run the following commands
conda tos accept --override-channels --channel https://repo.anaconda.com/pkgs/main
conda tos accept --override-channels --channel https://repo.anaconda.com/pkgs/r
conda create -n rl python=3.13 -y

# Activate the rl envrionment
conda activate rl

# Install PyTorch for CPU
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu

# Verify PyTorch Installation
# Type python and then press Enter
python

# Copy and paste these three lines, and press 2xEnter
import torch
print("PyTorch version:", torch.__version__)
print("CUDA available:", torch.cuda.is_available())

If you see something like in the bellow image, then congratulations! PyTorch for CPU is installed correctly!

PyTorch for CPU is installed correctly
PyTorch for CPU is installed correctly

Install Stable-Baselines3

Why install Stable-Baselines3 (SB3) now? Because:

  • SB3 is directly based on PyTorch,
  • Now that PyTorch is confirmed to be functional, we can install SB3 without any risk of incompatibility.

Stable-Baselines3 is the brain that uses PyTorch as an engine. PyTorch knows how to “think” (numerical calculations), but SB3 knows how to learn. We use it to implement algorithms like PPO, DQN, SAC, or A2C.
So, after we’ve put the engine (PyTorch) in place, now we’re going to install the driver (SB3).

# Before installing SB3, make sure your Conda environment is active
# for Windows
conda activate gymenv
# for Linux
conda activate rl

# Universal command (valid for Windows and Linux)
pip install stable-baselines3

#If you want extra features such as TensorBoard, atari, etc.:
pip install "stable-baselines3[extra]"

# After installation, simply check:
python -c "import stable_baselines3; print(stable_baselines3.__version__)"
Installation check for SB3 on Windows
Installation check for SB3 on Windows. It is done similarly on Linux

Install Gymnasium

Now that you have PyTorch (the engine) and Stable-Baselines3 (the driver) installed, we are going to install Gymnasium. It is the training track where your reinforcement learning (RL) agent will train.

Installation on Windows and Linux

# Before installing Gymnasium, make sure your Conda environment is active
# for Windows
conda activate gymenv
# for Linux
conda activate rl

# Universal command (valid for Windows and Linux)
# Install the basic Gymnasium package. 
# This installs the core environments such as CartPole, MountainCar, Pendulum, etc.
pip install gymnasium

# Verify installation:
# step 1. Type python and then press Enter
python

# step 2. copy and paste the bellow python code. Then press Enter
import gymnasium as gym

env = gym.make("CartPole-v1")
observation, info = env.reset()
print("Environment loaded successfully on Linux!")
print("Initial observation:", observation)
env.close()

If the code runs without errors and prints similar values, Gymnasium works great. The results should be something like in the bellow image:

Installation check for Gymnasium on Windows. It is done similarly on Linux
Installation check for Gymnasium on Windows. It is done similarly on Linux

Verify the Full Reinforcement Learning Setup

Now that everything is installed, let’s test the complete setup by training a small RL agent.
We’ll use Stable-Baselines3 (SB3) together with Gymnasium and PyTorch to solve one of the simplest but most used environments: CartPole-v1.

At this step, our goal is to check that:

  • Gymnasium can create and run environments,
  • SB3 can communicate with PyTorch,
  • Training and logging work correctly.

If this demo runs without errors, our RL setup is working.

Before starting the demo training, we should verify that the training is running on GPU in Windows and CPU in Linux.

PyTorch + SB3 automatically uses the GPU if available. However, some users are unsure whether training actually uses the GPU or just the CPU.

How to check if SB3 uses GPU or CPU for PyTorch

# Before installing Gymnasium, make sure your Conda environment is active
# for Windows
conda activate gymenv
# for Linux
conda activate rl

# step 1. Type python and then press Enter
python

# step 2. copy and paste the bellow python code. Then press Enter
import torch
print("PyTorch version:", torch.__version__)
print("CUDA available:", torch.cuda.is_available())
print("Device name:", torch.cuda.get_device_name(0) if torch.cuda.is_available() else "CPU only")

The results should be like in the bellow images:

PyTorch uses GPU
PyTorch uses GPU (on Windows)
PyTorch uses CPU (on Linux)
PyTorch uses CPU (on Linux)

Demo training a small RL agent with PyTorch, SB3, and Gymnasium on Windows and Linux

STEP 1: Activate the environment

# Before installing Gymnasium, make sure your Conda environment is active
# for Windows
conda activate gymenv
# for Linux
conda activate rl

STEP 2: Create a new Python file

Create a Python file named test_sb3_cartpole.py and copy the code below inside it.

import gymnasium as gym
from stable_baselines3 import PPO

# 1. Create the environment (CartPole)
env = gym.make("CartPole-v1")

# 2. Initialize the agent using the PPO algorithm
model = PPO("MlpPolicy", env, verbose=1)

# 3. Train the agent for 10,000 steps
print("Training started...")
model.learn(total_timesteps=10_000)
print("Training finished!")

# 4. Save the trained model
model.save("ppo_cartpole_test")

# 5. Test the trained model
obs, info = env.reset()
for _ in range(5):
    action, _states = model.predict(obs, deterministic=True)
    obs, reward, terminated, truncated, info = env.step(action)
    env.render()
    if terminated or truncated:
        obs, info = env.reset()

env.close()
print("Everything works!")

STEP 3: Run the script

# In your terminal:
python test_sb3_cartpole.py

If everything is installed correctly, you’ll see a training log like this:

Using cuda device (Windows)
Using cuda device (Windows)
Demo CPU device (Linux)
Demo CPU device (Linux)

Note: For simple environments like CartPole, GPU acceleration provides little or no speed-up. But for vision-based or high-dimensional tasks (Atari, MuJoCo, Robotics), GPU makes a big difference.

ShareTweetShareShareSend
Previous Post

Discount Factor (gamma) Explained With Q-Learning + CartPole

Next Post

Deep Q-Learning – Build, Train, and Visualize with PyTorch, Gymnasium, and SB3

Related Posts

How To Setup MuJoCo, Gymnasium, PyTorch, SB3 and TensorBoard on Windows
MuJoCo

How To Setup MuJoCo, Gymnasium, PyTorch, SB3 and TensorBoard on Windows

March 4, 2026
What is Actor-Critic in Reinforcement Learning?
Deep RL Algorithms

What is Actor-Critic in Reinforcement Learning?

January 20, 2026
Next Post
Deep Q-Learning – Build, Train, and Visualize with PyTorch, Gymnasium, and SB3

Deep Q-Learning - Build, Train, and Visualize with PyTorch, Gymnasium, and SB3

The Complete Guide of Learning Rate in RL 

The Complete Guide of Learning Rate in RL 

The six pillars of PPO stability and performance in Stable-Baselines3

The Complete Practical Guide to PPO with Stable-Baselines3

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

About the author

About Dragos Calin

Dragos Calin is a robotics engineer and reinforcement learning practitioner focused on building real-world autonomous and remote-controlled robotics for agriculture, edge-AI robotics, and embedded platforms. His work join simulation, machine learning, and hardware deployment, with a strong emphasis on practical, testable solutions that function outside the lab.

Areas of Expertise:

  • # Reinforcement Learning for Robotics
  • # Autonomous Agricultural Robots
  • # Embedded Systems & Edge AI (Jetson, Raspberry Pi, Arduino)
  • # Robotic Simulation & Sim2Real Workflow
  • # Sensor Fusion & Control Systems
  • # ROS-Based Robotics Development

Tags

Actor-Critic Bellman Equation Evaluation Metrics Exploitation Exploration Hyperparameter Tuning Machine Learning Markov Decision Process MDP MDP (Markov Decision Process) Normalization Partial Observability POMDP Q-Function Replay Buffer Temporal Difference TensorBoard
Newsletter

Subscribe Blog for Latest Updates

To stay updated with our newest projects and tutorials, make sure you subscribe to our newsletter. 

We do not share your information! You can subscribe  at any time. By subscribing you agree to our Privacy Policy.

Stay Tuned – Follow Us

To stay updated with our newest projects and tutorials, make sure you follow us on: Twitter / X

Site Information

  • About
  • Contact
  • Privacy Policy
  • Terms of Service

© 2026 Reinforcement Learning Path

Welcome Back!

Login to your account below

Forgotten Password?

Retrieve your password

Please enter your username or email address to reset your password.

Log In
No Result
View All Result
  • RL Fundamentals
    • Learn to train intelligent agents that actually converge
      • RL FOUNDATION
      • CLASSIC DEEP RL APPLICATION
    • Q-Learning
  • Deep RL Algorithms
    • DQN
    • PPO
    • SAC
  • Simulation & Environments
    • OpenAI Gymnasium
  • Tools, Code & Experiment Design
    • PyTorch
    • Stable-Baselines3

© 2026 Reinforcement Learning Path