This page was last edited on 12 November 2025
Normalization is a preprocessing technique used to transform input data into a consistent numerical range or distribution. With this technique, we ensure that all inputs or observations have similar scales. It is important to have the same scale when working with algorithms that rely on gradient-based learning, such as neural networks.
There are two common types of normalization.
1. Min-Max Normalization
Rescales values to a fixed range, usually [0,1] or [−1,1].
The formula is:
![Rendered by QuickLaTeX.com \[ \hspace{5mm} \fbox{ \begin{array}{c} \vspace{5mm} \\ \displaystyle x' = \frac{x - x_{\text{min}}}{x_{\text{max}} - x_{\text{min}}} \\ \vspace{5mm} \end{array} } \hspace{5mm} \]](https://www.reinforcementlearningpath.com/wp-content/ql-cache/quicklatex.com-9ed95e5d380c477b78fda160ac6e482c_l3.png)
You can read more about min-max normalization in this tutorial: Hands-On: Min-Max Normalization In Action
2. Z-Score Normalization (Standardization)
Transforms data to have a mean of 0 and a standard deviation of 1.
The formula is:
![Rendered by QuickLaTeX.com \[ \hspace{5mm} \fbox{ \begin{array}{c} \vspace{5mm} \\ \displaystyle x' = \frac{x - \mu}{\sigma} \\ \vspace{5mm} \end{array} } \hspace{5mm} \]](https://www.reinforcementlearningpath.com/wp-content/ql-cache/quicklatex.com-da7d22fb9d51738b2ded8e1aea1d3294_l3.png)
Where:
- μ – is the mean.
- σ – is the standard deviation of the feature.
Why is Normalization Important in Reinforcement Learning?
In Deep RL, normalization of inputs (observations) and sometimes rewards is essential for convergence. The reason is the agent uses neural networks to approximate policies or value functions.
Here is a list with important roles of normalization:
- Balances feature influence: Prevents features with large numeric values (e.g., speed in RPM) from dominating smaller ones (e.g., temperature in °C).
- Speeds up convergence: Normalized input leads to smoother and faster optimization using gradient descent.
- Improves numerical stability: Helps prevent issues like vanishing or exploding gradients during backpropagation.
- Makes training robust: Especially important in real-world robotics, where sensor readings can vary in scale.
ANALOGY
Imagine a self-driving car:
- The steering wheel turns 0 to 180 degrees.
- The gas pedal response from 0 to 10,000 pressure units.
If we program the servo motors to apply an equal pressure to both controls, the car reacts 100x more to the gas pedal than to the steering wheel.
That’s what happens in RL when one input feature (like speed) is much larger in magnitude than another (like orientation angle).
The agent learns to rely disproportionately on high-magnitude inputs—not because they are more important, but because they are bigger.
Example 1: Apply normalization for speed and temperature
Temperature (°C): [5,10,20,40,80]
Speed (m/s): [0.2,0.3,0.5,0.7,0.9]
If we feed the agent with these components as it is, the learning process may wrongly assign more weight to temperature due to its larger values.
Step 1: Z-Score Normalization
Compute mean and standard deviation for Temperature:
Mean:
![Rendered by QuickLaTeX.com \[ \hspace{5mm} \fbox{ \begin{array}{c} \vspace{5mm} \\ \displaystyle \mu_{\text{temp}} = \frac{5 + 10 + 20 + 40 + 80}{5} = 31 \\ \vspace{5mm} \end{array} } \hspace{5mm} \]](https://www.reinforcementlearningpath.com/wp-content/ql-cache/quicklatex.com-fafa679f01dedcee7c345b31830361c3_l3.png)
Standard deviation:
![Rendered by QuickLaTeX.com \[ \hspace{5mm} \fbox{ \begin{array}{c} \vspace{5mm} \\ \displaystyle \sigma_{\text{temp}} = \sqrt{ \frac{(5 - 31)^2 + \cdots + (80 - 31)^2}{5} } \approx 29.7 \\ \vspace{5mm} \end{array} } \hspace{5mm} \]](https://www.reinforcementlearningpath.com/wp-content/ql-cache/quicklatex.com-ec5c8e58a828468fe69807c35f40588c_l3.png)
Compute mean and standard deviation for Speed:
Mean:
![Rendered by QuickLaTeX.com \[ \hspace{5mm} \fbox{ \begin{array}{c} \vspace{5mm} \\ \displaystyle \mu_{\text{speed}} = \frac{0.2 + 0.3 + 0.5 + 0.7 + 0.9}{5} = 0.52 \\ \vspace{5mm} \end{array} } \hspace{5mm} \]](https://www.reinforcementlearningpath.com/wp-content/ql-cache/quicklatex.com-55fb81f59b412de43c7f4ffb3a44be3d_l3.png)
Standard deviation:
![Rendered by QuickLaTeX.com \[ \hspace{5mm} \fbox{ \begin{array}{c} \vspace{5mm} \\ \displaystyle \sigma_{\text{speed}} = \sqrt{ \frac{(0.2 - 0.52)^2 + \cdots + (0.9 - 0.52)^2}{5} } \approx 0.25 \\ \vspace{5mm} \end{array} } \hspace{5mm} \]](https://www.reinforcementlearningpath.com/wp-content/ql-cache/quicklatex.com-af5d45febbcea107ecb610a5f4801eee_l3.png)
Step 2: Normalize each feature value
Normalized Temperature:
![Rendered by QuickLaTeX.com \[ \hspace{5mm} \fbox{ \begin{array}{l} \vspace{2mm} \\ \displaystyle x = 5 \rightarrow \frac{(5 - 31)}{29.7} \approx -0.87 \\\\ \displaystyle x = 80 \rightarrow \frac{(80 - 31)}{29.7} \approx +1.65 \\ \vspace{2mm} \end{array} } \hspace{5mm} \]](https://www.reinforcementlearningpath.com/wp-content/ql-cache/quicklatex.com-7258a4efac722421bcf03c58f18a3068_l3.png)
Normalized Speed:
![Rendered by QuickLaTeX.com \[ \hspace{5mm} \fbox{ \begin{array}{l} \vspace{2mm} \\ \displaystyle x = 0.2 \rightarrow \frac{(0.2 - 0.52)}{0.25} \approx -1.28 \\\\ \displaystyle x = 0.9 \rightarrow \frac{(0.9 - 0.52)}{0.25} \approx +1.52 \\ \vspace{2mm} \end{array} } \hspace{5mm} \]](https://www.reinforcementlearningpath.com/wp-content/ql-cache/quicklatex.com-350ce729efb62e19c0736257da177ecc_l3.png)
After normalization, both features are in roughly the same numeric range [−1.5,+1.5]. This ensures that the neural network treats them with equal consideration when learning.
Before and After Z-Score Normalization

What to Remember About Normalization
In Deep RL normalization is not optional. It is critical for reliable and fast training.
We have to normalize:
- Inputs (observations from the environment)
- Sometimes the reward signal (to stabilize learning)
What About Actions? Normalize or Not?
In Deep RL, we must normalize actions to [-1, 1] when working with continuous control algorithms like DDPG, TD3, or SAC. These algorithms output real-valued actions and expect them to fall in a standard range—usually [-1, 1]—to ensure stability and compatibility with activation functions (like tanh) used in policy networks.
But when using discrete actions, like in DQN or QR-DQN, normalization is not needed. Discrete actions are usually represented by integers (e.g., {0, 1, 2}), and the network simply learns to pick the best one. No scaling required.
Choose the method depending on context:
- Use Z-score normalization when data is roughly Gaussian distributed.
- Use Min-Max when a fixed range is desirable.
In real-world robotics, sensor data often comes with very different scales (e.g., voltage, RPM, angle, current) — always normalize!
Spaces << Previous | Next >> Function Approximation