Learn to train intelligent agents that actually converge

    “What we have to learn to do, we learn by doing.” ― Aristotle

Anyone can now generate Python code for an RL agent. However, the real value has migrated from “code” to “system architecture” and “physical intuition.”

Whether you’re a student, hobbyist, engineer or an AI enthusiast, you need to:

  • understand the math: no more black boxes. We solve the Bellman equation with pen and paper before we write a single line of code.
  • build the logic: learn why agents fail, why gradients explode, and how to fix them manually.
  • own the execution: move from theory to real-world robotics where every step and calculation counts.

In a world full of AI hallucinations, a tutorial that shows you exactly how an ultrasonic sensor behaves in real noise conditions is worth more than 1000 lines of synthetically generated code.

Practice is not the final step of learning. It is the first!


An autonomous farmer robot — built to think in the field, and learn row by row.
An autonomous farmer robot — built to think in the field, and learn row by row.

Fifteen years ago, I started learning about and building autonomous robots. In the beginning, I followed a classic approach: many pages of theory, followed by practice.

After some time, I realized that this learning model works well for simple tasks. However, as the complexity of the field increases, a new learning strategy becomes necessary. The method that worked for me was based on a balance between theory, examples, illustrations, analogies, and practical applications.

I’ve reengineered the complex world of Deep Reinforcement Learning into a 5-level system designed to take you from beginners to advanced without getting lost in abstract math.

For now only the first 3 levels are available, following the next 2 advanced levels to be published in the coming months.


Level 1: RL Fundamentals

Essential. Without this, RL is “black magic”.

As a first step, you need to understand how an agent thinks. I’ve selected the essential topics that take you from “how do I start?” to “I know exactly why this works.” In every page, I don’t just look at formulas, I also add analogies and examples(where it was possible). I break down the logic of how an agent learns from its environment, making sure you don’t get lost in the process.

Here is what I’ll cover in this first level:


Theory is good, but seeing how an agent learn is what makes everything clear.

Using DQN and CNN for teaching an agent to recognize the digit 3
Using DQN and CNN for teaching an agent to recognize the digit 3

In this level, you’ll train an agent to answer one simple question: “Is the digit in this image a 3?”.

I’ve designed this project to be incredibly easy. You don’t need to install any complicated tools or libraries on your computer. You’ll use Google Colab to run everything in the cloud. In this way, you can focus 100% on the logic, not on debugging your installation.

The goal here isn’t just to “run code.” It’s to see how Deep Reinforcement Learning works from the inside. You’ll understand every line of code, and by the end, you’ll see how to apply this same logic to real-world robotics later.

What you will learn:

  • How to describe a RL problem from scratch. It will helps you to turn a simple idea into a learning goal.
  • Markov Decision Process (MDP). How to start from the problem description, and end into the language of RL.
  • Choosing the right algorithm. Why I’ve used DQN for this specific task.
  • Building the “Eyes” and “Brain.” Creating the environment and the Convolutional Neural Network (CNN) model.
  • Training and testing. How to run the agent live in Google Colab and improve its performance.

Structure of the tutorial – to keep things easy to follow, I’ve broken this application into six parts:

Each page is short and clear. You can follow everything step-by-step, and at the end, you’ll have an agent that actually learns from what it sees.


Great for understanding hyperparameters.

Now that you’ve seen an agent learn in Level 2, it’s time to see how different parameters and algorithms can be used for different types of problems. In Level 3, we move into ‘The Lab.’

I’ve prepared a series of tutorials using classic environments to help you understand when to use one algorithm over another. You’ll look at how an agent learns to balance, then climb, and how it makes decisions in uncertain worlds.

Here is what you’ll find in this level:

In each of these, I keep the same approach: simple explanations, clear code, and the logic behind every decision.


Next Step

If you’re ready to learn more about Reinforcement Learning, it’s important to know the different ways an agent can learn, from trial-and-error to learning from human feedback or expert demonstrations. On the next page I will explain all these strategies to you.

Next >> Learning Strategies

This page was last edited on 27 February 2026