This page was last edited on 12 November 2025
Curriculum Learning means training the agent on easier tasks first. Then gradually increasing difficulty as the agent learns. It’s like how humans learn: start simple, then go deeper.
In Reinforcement Learning, it means structuring environments or goals in a logical progression. So the agent improves faster, more stable, and more general.
Why do we use Curriculum Learning in Deep RL?
Because real-world RL tasks are hard. Exploration is inefficient and rewards are sparse.
Learning from scratch is slow and unstable.
Curriculum Learning helps by reducing the learning gap. It guides the agent through a learning path — step by step. This leads to faster convergence and better final performance.
Is there an equation for Curriculum Learning?
Not a single equation like in Bellman or Q-learning.
But the core idea can be described by:
![Rendered by QuickLaTeX.com \[ \hspace{5mm} \fbox{ \begin{array}{c} \vspace{5mm} \\ \displaystyle \mathbb{T} = \{ \mathbb{T}_1, \mathbb{T}_2, \ldots, \mathbb{T}_n \} \\ \vspace{5mm} \end{array} } \hspace{5mm} \]](https://www.reinforcementlearningpath.com/wp-content/ql-cache/quicklatex.com-d7e3205ac03a1272897dd37a6c85eab5_l3.png)
Where:
Tᵢis a task at leveli.T₁is the simplest,Tₙis the hardest
The agent trains on T₁, then T₂, and so on. A scheduler function decides when to move to the next task.
Example of a scheduler:
![Rendered by QuickLaTeX.com \[ \hspace{5mm} \fbox{ \begin{array}{c} \vspace{5mm} \\ \displaystyle S(t) = \arg\max_i \left\{ \mathbb{P}(\text{success on } \mathbb{T}_i) > \theta \right\} \\ \vspace{5mm} \end{array} } \hspace{5mm} \]](https://www.reinforcementlearningpath.com/wp-content/ql-cache/quicklatex.com-c819a7e6fe9d7f6a96a9b4113b746b74_l3.png)
Where:
S(t)= selected task at timetP(success)= success rate on current taskθ= predefined threshold (e.g. 80% success)
Main Components of Curriculum Learning
- Task Space
A set of environments or goals with increasing difficulty. - Progression Strategy
Rules for how to advance to the next task.
Can be manual, performance-based, or automatic. - Success Metrics
Defines when a task is “mastered.”
Example: 90% success rate over last N episodes. - Scheduler
Decides when to switch tasks or increase complexity. - Initialization Policy
How the agent is placed in the environment.
E.g., easier starts at the beginning, harder starts later.
ANALOGY
Think of how a child learns math. First learns numbers.
Then addition. Then multiplication. You don’t start with calculus.
The brain needs structure to build understanding. Same with agents.
Start simple. Learn stable patterns and then move to hard tasks with confidence.
How to split the implementation into steps
Here’s a step-by-step flow:
Step 1: Define the full task (goal environment).
Step 2: Decompose it into simpler sub-tasks.
Step 3: Order them by difficulty.
Step 4: Set criteria to move to the next task (e.g. success rate).
Step 5: Train agent on easiest task.
Step 6: Once criteria met, move to the next task.
Step 7: Repeat until full task is learned.
Step 8: Test performance on full task only.
This works best when tasks are incrementally harder but share features.
Core Concepts and Methods Behind Curriculum Learning
- Task Decomposition – split hard tasks into manageable parts
- Reward Shaping – temporarily modify rewards to help learning
- Transfer Learning – agent reuses knowledge from previous tasks
- Progressive Complexity – controlled difficulty increase
- Goal Sampling – select easier goals first (used in HER or GoExplore)
- Self-Paced Learning – agent chooses when to move forward
- Teacher-Student Model – an external teacher chooses the next lesson
References
- Bengio, Y., Louradour, J., Collobert, R., & Weston, J. (2009). Curriculum Learning. ICML.
- Matiisen, T., Oliver, A., Cohen, T., & Schulman, J. (2017). Teacher-Student Curriculum Learning. arXiv preprint arXiv:1707.00183.
- Sutton, R. S., & Barto, A. G. (2018). Reinforcement Learning: An Introduction. MIT Press.
Experience Replay << Previous | Next >> Isaac Sim