Curriculum Learning: Building Knowledge Gradually

This page was last edited on 12 November 2025

Curriculum Learning means training the agent on easier tasks first. Then gradually increasing difficulty as the agent learns. It’s like how humans learn: start simple, then go deeper.

In Reinforcement Learning, it means structuring environments or goals in a logical progression. So the agent improves faster, more stable, and more general.

Why do we use Curriculum Learning in Deep RL?

Because real-world RL tasks are hard. Exploration is inefficient and rewards are sparse.

Learning from scratch is slow and unstable.

Curriculum Learning helps by reducing the learning gap. It guides the agent through a learning path — step by step. This leads to faster convergence and better final performance.

Is there an equation for Curriculum Learning?

Not a single equation like in Bellman or Q-learning.

But the core idea can be described by:

    \[ \hspace{5mm} \fbox{     \begin{array}{c}         \vspace{5mm} \\         \displaystyle          \mathbb{T} = \{ \mathbb{T}_1, \mathbb{T}_2, \ldots, \mathbb{T}_n \} \\         \vspace{5mm}     \end{array} } \hspace{5mm} \]

Where:

  • Tᵢ is a task at level i. T₁ is the simplest, Tₙ is the hardest

The agent trains on T₁, then T₂, and so on. A scheduler function decides when to move to the next task.

Example of a scheduler:

    \[ \hspace{5mm} \fbox{     \begin{array}{c}         \vspace{5mm} \\         \displaystyle          S(t) = \arg\max_i \left\{ \mathbb{P}(\text{success on } \mathbb{T}_i) > \theta \right\} \\         \vspace{5mm}     \end{array} } \hspace{5mm} \]

Where:

  • S(t) = selected task at time t
  • P(success) = success rate on current task
  • θ = predefined threshold (e.g. 80% success)

Main Components of Curriculum Learning

  1. Task Space
    A set of environments or goals with increasing difficulty.
  2. Progression Strategy
    Rules for how to advance to the next task.
    Can be manual, performance-based, or automatic.
  3. Success Metrics
    Defines when a task is “mastered.”
    Example: 90% success rate over last N episodes.
  4. Scheduler
    Decides when to switch tasks or increase complexity.
  5. Initialization Policy
    How the agent is placed in the environment.
    E.g., easier starts at the beginning, harder starts later.

ANALOGY

Think of how a child learns math. First learns numbers.

Then addition. Then multiplication. You don’t start with calculus.

The brain needs structure to build understanding. Same with agents.

Start simple. Learn stable patterns and then move to hard tasks with confidence.

How to split the implementation into steps

Here’s a step-by-step flow:

Step 1: Define the full task (goal environment).
Step 2: Decompose it into simpler sub-tasks.
Step 3: Order them by difficulty.
Step 4: Set criteria to move to the next task (e.g. success rate).
Step 5: Train agent on easiest task.
Step 6: Once criteria met, move to the next task.
Step 7: Repeat until full task is learned.
Step 8: Test performance on full task only.

This works best when tasks are incrementally harder but share features.

Core Concepts and Methods Behind Curriculum Learning

  • Task Decomposition – split hard tasks into manageable parts
  • Reward Shaping – temporarily modify rewards to help learning
  • Transfer Learning – agent reuses knowledge from previous tasks
  • Progressive Complexity – controlled difficulty increase
  • Goal Sampling – select easier goals first (used in HER or GoExplore)
  • Self-Paced Learning – agent chooses when to move forward
  • Teacher-Student Model – an external teacher chooses the next lesson

References


Experience Replay << Previous | Next >> Isaac Sim