“What we have to learn to do, we learn by doing.” ― Aristotle
The Architect’s Path: Building Intelligence from First Principles
Anyone can now generate Python code for an RL agent. However, the real value has migrated from “code” to “system architecture” and “physical intuition.”
Whether you’re a student, hobbyist, engineer or an AI enthusiast, you need to:
- understand the math: no more black boxes. We solve the Bellman equation with pen and paper before we write a single line of code.
- build the logic: learn why agents fail, why gradients explode, and how to fix them manually.
- own the execution: move from theory to real-world robotics where every step and calculation counts.
In a world full of AI hallucinations, a tutorial that shows you exactly how an ultrasonic sensor behaves in real noise conditions is worth more than 1000 lines of synthetically generated code.
Practice is not the final step of learning. It is the first!
A Learning Strategy Built from Experience

Fifteen years ago, I started learning about and building autonomous robots. In the beginning, I followed a classic approach: many pages of theory, followed by practice.
After some time, I realized that this learning model works well for simple tasks. However, as the complexity of the field increases, a new learning strategy becomes necessary. The method that worked for me was based on a balance between theory, examples, illustrations, analogies, and practical applications.
I’ve reengineered the complex world of Deep Reinforcement Learning into a 5-level system designed to take you from beginners to advanced without getting lost in abstract math.
For now only the first 3 levels are available, following the next 2 advanced levels to be published in the coming months.
Level 1: RL Fundamentals
Essential. Without this, RL is “black magic”.
As a first step, you need to understand how an agent thinks. I’ve selected the essential topics that take you from “how do I start?” to “I know exactly why this works.” In every page, I don’t just look at formulas, I also add analogies and examples(where it was possible). I break down the logic of how an agent learns from its environment, making sure you don’t get lost in the process.
Here is what I’ll cover in this first level:
- Learning strategies in deep reinforcement learning – understanding the different ways an agent can learn, from trial-and-error to expert demonstrations.
- How to choose a reinforcement learning algorithm – a practical guide to picking the right algorithm for your specific problem.
- Bellman equation in reinforcement learning – I’ve solve it with pen and paper to eliminate the “black box” feeling.
- From MDP to POMDP – why reinforcement learning often fails in the real world when information is missing.
- SIM2REAL – how to close the gap between a perfect simulation and a messy, physical robot.
- What Is Q-Learning? – a clear breakdown of the formula and the intuition behind it.
- Deep Q Network (DQN) – combining deep learning with RL to learn optimal behavior even when the states cannot be explicitly enumerated.
- Proximal Policy Optimization (PPO) – mastering one of the most popular and stable algorithms used today in robotics.
- Soft Actor-Critic (SAC) – learning about high-performance, “sample-efficient” algorithms for robotics.
Level 2: Your First Practical Step – The “Digit 3” Agent
Theory is good, but seeing how an agent learn is what makes everything clear.

In this level, you’ll train an agent to answer one simple question: “Is the digit in this image a 3?”.
I’ve designed this project to be incredibly easy. You don’t need to install any complicated tools or libraries on your computer. You’ll use Google Colab to run everything in the cloud. In this way, you can focus 100% on the logic, not on debugging your installation.
The goal here isn’t just to “run code.” It’s to see how Deep Reinforcement Learning works from the inside. You’ll understand every line of code, and by the end, you’ll see how to apply this same logic to real-world robotics later.
What you will learn:
- How to describe a RL problem from scratch. It will helps you to turn a simple idea into a learning goal.
- Markov Decision Process (MDP). How to start from the problem description, and end into the language of RL.
- Choosing the right algorithm. Why I’ve used DQN for this specific task.
- Building the “Eyes” and “Brain.” Creating the environment and the Convolutional Neural Network (CNN) model.
- Training and testing. How to run the agent live in Google Colab and improve its performance.
Structure of the tutorial – to keep things easy to follow, I’ve broken this application into six parts:
- PART 1: Overview of the tutorial.
- PART 2: Problem Definition.
- PART 3: Markov Decision Process (MDP).
- PART 4: Choosing the Algorithm (DQN).
- PART 5: Environment + RL Model + Reward Function.
- PART 6: Training + Testing + Google Colab Access.
Each page is short and clear. You can follow everything step-by-step, and at the end, you’ll have an agent that actually learns from what it sees.
Level 3: Expanding Your Skills – From Simple Balances to Complex Control
Great for understanding hyperparameters.
Now that you’ve seen an agent learn in Level 2, it’s time to see how different parameters and algorithms can be used for different types of problems. In Level 3, we move into ‘The Lab.’
I’ve prepared a series of tutorials using classic environments to help you understand when to use one algorithm over another. You’ll look at how an agent learns to balance, then climb, and how it makes decisions in uncertain worlds.
Here is what you’ll find in this level:
- Q-Learning Example with CartPole – you will understand how the Q-values are updated in Q-Learning for the CartPole task.
- CartPole with DQN – start with the basics of balancing a pole using Deep Q-Networks.
- MountainCar with DQN – learn how an agent handles physics and momentum to reach a goal.
- CartPole with PPO – see why Proximal Policy Optimization is a favorite for stable and reliable training.
- Pendulum with SAC – moving into continuous control with Soft Actor-Critic, perfect for smooth robotic movements.
- FrozenLake with DQN – solving problems where every step and every decision counts.
In each of these, I keep the same approach: simple explanations, clear code, and the logic behind every decision.
Next Step
If you’re ready to learn more about Reinforcement Learning, it’s important to know the different ways an agent can learn, from trial-and-error to learning from human feedback or expert demonstrations. On the next page I will explain all these strategies to you.
Next >> Learning Strategies
This page was last edited on 27 February 2026