Experience Replay: Learning from the Past

This page was last edited on 12 November 2025

Experience Replay is a technique used in Deep Reinforcement Learning (Deep RL) where the agent stores past experiences in a memory buffer and reuses them later during training.

Each experience is stored as a tuple:

(state, action, reward, next_state, done)

Instead of learning from recent transitions only, the agent samples a batch of past experiences randomly from this buffer.

Why do we use Experience Replay in Deep RL?

  • It breaks correlation between consecutive samples.
  • Improves data efficiency by learning multiple times from the same experience.
  • Stabilizes training.
  • Works especially well with off-policy algorithms like DQN.

Without experience replay, the model would train on highly correlated data—this hurts learning.

ANALOGY

We have to think of Experience Replay like studying with flashcards.

We don’t just study the last thing we learned. We shuffle the flashcards and review old concepts to remember better.

Some flashcards (experiences) are more useful—we study those more often (this is prioritized replay).

HISTORY

  • Experience Replay was introduced by Lin (1992) in early RL work using neural networks.
  • It became popular after DeepMind’s DQN paper (2015).
  • The DQN used it to stabilize Q-learning when playing Atari games.

How Experience Replay works in practice

  1. Initialize replay buffer D
  2. Observe transition (s, a, r, s′, done)
  3. Store it in buffer D
  4. Sample a random minibatch from D
  5. Compute loss using Q-learning
  6. Backpropagate and update weights
  7. Repeat.

Inputs and Outputs

Inputs:

  • New experience (s, a, r, s′, done)
  • Batch size
  • Sampling method (uniform or prioritized)

Outputs:

  • Batch of experiences for training
  • Optional: importance-sampling weights (for prioritized replay)

References:


SIM2REAL << Previous | Next >> Curriculum Learning