The First Principle in Deep Reinforcement Learning

This page was last edited on 19 September 2025

What do a child on a bicycle, a pilot in the sky, and a Reinforcement Learning agent have in common?
They all must act without certainty.

In the field of Reinforcement Learning, everything changes at a dizzying speed. New libraries, models, and optimizers appear and disappear. What does not change is the basic idea. The agent learns to make effective choices in uncertain environments.

The First Principle is the basic truths that hold everything together. These fundamental principles are not tied to a framework or learning model. They are universal, and once mastered, you can build anything.

Why start from the first principle?

  • You will have clarity. It forces you to understand the basic concepts, without copying existing hypotheses or recipes.
  • You will have creativity. Beyond what others do, you can generate new, original solutions, because you start from indisputable truths.
  • It helps you solve complex problems. A big problem becomes approachable if you break it down into its simplest elements and build the solution starting from them.
  • It’s a solid foundation for learning. Instead of memorizing formulas or methods, you understand the “why” behind them.
  • You’ll make better decisions. You’ll choose solutions based on reality and logic, not conventions or traditions.
  • Another reason to start from the first principle is that it’s easier for you to explain concepts to others (or even to an RL agent), rather than from abstract formulas.

When technology, context, or rules change, the fundamental principles remain valid and guide you.

Deep RL = Optimal decision-making under uncertainty.

The First Principle in Reinforcement Learning is divided into several fundamental components. These components are the basis of any system controlled by an RL agent.

  1. Agent: the first component of the principle is the Agent. This is the main piece, the artificial brain that learns to make optimal decisions under conditions of uncertainty. This can be a robot that learns to walk, a video game, or even a computer program.
  2. The Environment: is the world in which the agent will act. The environment is the floor where the robot will learn to walk, the game maps, or the application that displays products recommended by an RL agent.
  3. Actions: are the choices that the agent can make. If the agent controls a robot, the actions can be to go forward, backward, jump, pick up an object. If the agent is in a video game, it can act to go forward, backward, collect coins, hit a wall.
  4. States: represent the information that the agent has about the environment at a certain moment. If the robot is standing, with the ball in front of it and an obstacle next to it, that is its “now” state.
  5. Rewards: is the feedback that the agent receives after performing an action. The reward can be positive, it is good, or negative, it is bad. If the robot falls instead of standing, the reward is negative. If it stands, the reward is positive.

These components of an RL system, never changes. Whatever algorithm is used (PPO, DQN, SAC), they all optimize actions in uncertain environments.

In the first “Detect Digit 3” project, the agent sees a 28×28 image. It chooses: 1 (yes, it’s a 3) or 0 (no).

  • State: the image
  • Action: yes or no
  • Reward: +1 if correct, -1 if wrong

I train it using the Deep Q-Network (DQN) algorithm which learns Q(s, a) using a Convolutional Neural Network (CNN) to process the image.

It learns to maximize correct predictions by optimizing its decisions.

  • Learning libraries, not concepts.
  • Chasing model performance without understanding the policy.
  • Writing reward functions without thinking about behavior.

If your agent fails, return to the first principle: what decision did it learn to make – and why?

In Deep RL, all the complexity – neural networks, optimizers, training loops – serves one goal: helping the agent make better decisions.

You don’t need to memorize models. You need to understand decision-making.

That’s the foundation. That’s the first principle.

What is the most important decision you take each day under uncertainty? And how would you rewrite it if you were an RL agent?


Before exploring algorithms, training loops, or performance tricks, there’s one idea you need to understand deeply.

It’s not a model.
It’s not a library.
It’s a principle.

The core of Deep Reinforcement Learning is not about tools. It’s about decision-making under uncertainty.

That’s where everything starts. That’s what never changes.


Your First Step << Previous | Next >>  Learning Strategies