This page was last edited on 12 November 2025
Adam optimization is a method used to adjust the learning rate during training.
It helps neural networks learn faster and more reliably by adapting step sizes for each parameter.
In Deep Reinforcement Learning (Deep RL), we use Adam optimization to improve the training of the agent’s neural network.
Why do we use Adam optimization in Deep RL?
Deep RL needs stable and efficient learning because environments are noisy and rewards are sparse.
Adam optimization makes learning smoother by combining momentum and adaptive learning rates.
It handles noisy gradients very well, which is critical in RL where feedback is often unstable.
Equation(s) of Adam optimization
Adam optimizer uses these key steps:
1. Initialize
m0 = 0 (first moment vector – mean of gradients)
v0 = 0 (second moment vector – uncentered variance of gradients)
2. At each step t
gt=∇θJ(θt) → gradient at time t
mt=β1* mt−1 + (1−β1) * gt → update biased first moment
vt=β2 * vt−1+(1−β2) * gt2 → update biased second moment
Bias correction:
![Rendered by QuickLaTeX.com \[ \hspace{5mm} \fbox{ \begin{array}{l} \vspace{2mm} \\ \displaystyle \hat{m}_t = \frac{m_t}{1 - \beta_1^t} \\\\ \displaystyle \hat{v}_t = \frac{v_t}{1 - \beta_2^t} \\ \vspace{2mm} \end{array} } \hspace{5mm} \]](https://www.reinforcementlearningpath.com/wp-content/ql-cache/quicklatex.com-a906f4f1f3bbac4319b1648fd914b30f_l3.png)
Parameter update:
![Rendered by QuickLaTeX.com \[ \hspace{5mm} \fbox{ \begin{array}{c} \vspace{5mm} \\ \displaystyle \theta_{t+1} = \theta_t - \alpha \cdot \frac{\hat{m}_t}{\sqrt{\hat{v}_t} + \epsilon} \\ \vspace{5mm} \end{array} } \hspace{5mm} \]](https://www.reinforcementlearningpath.com/wp-content/ql-cache/quicklatex.com-580148db910c84a1ea5339692dcc79f0_l3.png)
Where:
- α: learning rate (usually 0.001)
- β1: decay rate for first moment (default 0.9)
- β2: decay rate for second moment (default 0.999)
- ϵ: small constant to avoid division by zero (default 10−8
ANALOGY
Imagine you are hiking in a forest. You don’t always move in straight lines because the ground is uneven.
You adjust your steps based on the slope (gradient) and also based on how bumpy the path was recently (momentum and variance).
Adam optimization works the same way — it adapts step size depending on past slopes and bumps.
Inputs and outputs of Adam optimization
Inputs:
- Current parameters θ
- Current gradient gt
- Previous moments mt−1, vt−1
Outputs:
- Updated parameters θt+1
- Updated moments mt, vt
References:
- Diederik P. Kingma, Jimmy Ba, “Adam: A Method for Stochastic Optimization” (2014)
- PyTorch documentation on Adam Optimizer
Artificial Neuron << Previous | Next >> CNN