Adam Optimization in Deep Reinforcement Learning

This page was last edited on 12 November 2025

Adam optimization is a method used to adjust the learning rate during training.

It helps neural networks learn faster and more reliably by adapting step sizes for each parameter.

In Deep Reinforcement Learning (Deep RL), we use Adam optimization to improve the training of the agent’s neural network.

Why do we use Adam optimization in Deep RL?

Deep RL needs stable and efficient learning because environments are noisy and rewards are sparse.

Adam optimization makes learning smoother by combining momentum and adaptive learning rates.

It handles noisy gradients very well, which is critical in RL where feedback is often unstable.

Equation(s) of Adam optimization

Adam optimizer uses these key steps:

1. Initialize

m0​ = 0 (first moment vector – mean of gradients)

v0​ = 0 (second moment vector – uncentered variance of gradients)

2. At each step t

gt​=∇θ​J(θt​) → gradient at time t

mt​=β1​* mt−1​ + (1−β1​) * gt​ → update biased first moment

vt​2​ * vt−1​+(1−β2​) * gt2​ → update biased second moment

Bias correction:

    \[ \hspace{5mm} \fbox{     \begin{array}{l}         \vspace{2mm} \\         \displaystyle \hat{m}_t = \frac{m_t}{1 - \beta_1^t} \\\\         \displaystyle \hat{v}_t = \frac{v_t}{1 - \beta_2^t} \\         \vspace{2mm}     \end{array} } \hspace{5mm} \]

Parameter update:

    \[ \hspace{5mm} \fbox{     \begin{array}{c}         \vspace{5mm} \\         \displaystyle          \theta_{t+1} = \theta_t - \alpha \cdot \frac{\hat{m}_t}{\sqrt{\hat{v}_t} + \epsilon} \\         \vspace{5mm}     \end{array} } \hspace{5mm} \]

Where:

  • α: learning rate (usually 0.001)
  • β1​: decay rate for first moment (default 0.9)
  • β2: decay rate for second moment (default 0.999)
  • ϵ: small constant to avoid division by zero (default 10−8

ANALOGY

Imagine you are hiking in a forest. You don’t always move in straight lines because the ground is uneven.

You adjust your steps based on the slope (gradient) and also based on how bumpy the path was recently (momentum and variance).

Adam optimization works the same way — it adapts step size depending on past slopes and bumps.

Inputs and outputs of Adam optimization

Inputs:

  • Current parameters θ
  • Current gradient gt
  • Previous moments mt−1, vt−1

Outputs:

  • Updated parameters θt+1
  • Updated moments mt, vt

References:


Artificial Neuron << Previous | Next >> CNN