This page was last edited on 12 November 2025
Problem classification is about identifying the type of task we’re trying to solve in Reinforcement Learning(RL). It answers this question: “What kind of Reinforcement Learning(RL) problem is this?” It helps us to pick the right algorithm, architecture, and evaluation method for our application.
In Machine Learning for classification, we’re labeled data into categories. In Deep Reinforcement Learning, we classify problems into:
- Prediction vs. Control
- Prediction – Estimate the value of a state or action.
Example: How good is this state? What total reward do I expect if I stay here?
Used in policy evaluation and value estimation. - Control – Learn a policy to maximize total reward.
Example: What’s the best action to take from this state?
Used in training agents that interact and improve over time.
- Prediction – Estimate the value of a state or action.
- Online vs. Offline
- Online – Learn while interacting with the environment.
Example: A robot updates its policy as it moves through a room.
Common in real-time RL tasks. - Offline – Learn from a fixed dataset of past experiences.
Example: Train a driving policy using logged car sensor data.
Used when real-world interaction is expensive or unsafe.
- Online – Learn while interacting with the environment.
- Single-Agent vs. Multi-Agent
- Single-Agent – One agent learns and acts in the environment.
Example: A drone learns to navigate through a forest. - Multi-Agent – Multiple agents learn and act, possibly competing or cooperating.
Example: Two robots coordinate to move a heavy object.
Can be cooperative, competitive, or mixed.
- Single-Agent – One agent learns and acts in the environment.
- Fully Observable vs. Partially Observable
- Fully Observable – The agent sees the complete environment state.
Example: A chess engine knows the full board at every step. - Partially Observable – The agent has incomplete or noisy info.
Example: A cleaning robot can only see nearby rooms, not the whole house.
- Fully Observable – The agent sees the complete environment state.
- Discrete vs. Continuous actions
- Discrete – The agent chooses from a set of finite actions.
Example: {Left, Right, Stay} - Continuous – The agent picks actions from a range of values.
Example: Set motor speed to 2.4 m/s or steering angle to 30.5°
- Discrete – The agent chooses from a set of finite actions.
- Stationary vs. Non-Stationary environment
- Stationary – The environment’s rules and dynamics don’t change.
Example: A maze layout stays the same every episode. - Non-Stationary – The environment evolves over time.
Example: A stock trading agent where market conditions shift.
- Stationary – The environment’s rules and dynamics don’t change.
This is not about classifying images, it is about classifying the RL problem itself.
Think of problem classification as a step that comes before defining the Markov Decision Process(MDP).
- First you classify: “Is the environment fully observable?” → Yes → MDP.
- “Is it partially observable?” → Yes → Partially Observable Markov Decision Process (POMDP).
- Once classified, you can write the exact mathematical model (MDP or POMDP).
Why do we use problem classification in Deep RL?
Because Deep RL problems are not all the same.
Problem classification in deep reinforcement learning:
- Guides algorithm selection (e.g., Q-Learning vs. PPO)
- Helps in model design (e.g., discrete vs. continuous policy outputs)
- Ensures correct evaluation metrics
- Makes experiments reproducible
- Helps identify if exploration is needed or not
In short: classification = clarity. Without it, we’re at risk to use the wrong tools.
ANALOGY
To better understand problem classification, imagine a mechanic.
Different vehicles need different tools: a bicycle, a truck, or an electric scooter.
Before fixing anything, the mechanic first classifies what kind of vehicle he is dealing with.
Same in Deep RL. We classify the “type” of RL problem so we can choose the right “tools” (algorithms, models, strategies) to solve it.
HISTORY
Problem classification wasn’t formalized in early RL. Back then (1980s–1990s), most problems were small, discrete, and fully observable.
As RL matured (especially with Deep RL post-2013), the diversity of problems exploded.
Researchers realized we need a way to sort problems. Not all environments were Atari games. So, classification systems evolved, mostly from OpenAI Gym and RL literature.
First major structured use: DeepMind’s papers on Atari and AlphaGo started showing different problem types explicitly.
Inputs and outputs of a problem classification
Inputs:
- Environment description
- Observation and action spaces
- Reward function
- Agent interaction rules
- Transition and observation models
Outputs:
- A labeled problem type (e.g., “Partially Observable, Continuous Action, Multi-Agent Control Task”)
- Recommendations for:
- Model type
- Algorithm
- Training method
- Evaluation strategy
References:
- Sutton, R. S., & Barto, A. G. (2018). Reinforcement Learning: An Introduction. MIT Press.
- OpenAI Gym documentation.
Function Approximation << Previous | Next >> Bellman Equation