This page was last edited on 04 November 2025
Before we start using Reinforcement Learning (RL) to train an agent, we must first understand the fundamentals. Vectors are one of the most important concepts. Many AI techniques, including RL, would not work without vectors.
Vectors provide a solid foundation for data representation and training. In Artificial Intelligence, we use vectors to represent all types of data — images, text, sound. Complex data becomes a simple list of numbers that models can process.
For example, if we describe a drone’s position by saying “it’s in front of me,” it’s not enough. It’s vague. But if we say “10 meters forward, 5 meters right, and 2 meters up,” it becomes clear. That set of numbers (10, 5, 2) is a vector.
The 3D graph below shows exactly this. The drone’s position is a vector starting from the origin (0, 0, 0) and pointing to (x, y, z). The red dot shows the position. The blue arrow shows the direction and size of the vector.

In Reinforcement Learning, multiple types of vectors are used to represent different aspects of an agent’s state, actions, and learning process. Below are the types of vectors commonly used in RL.
1. Feature Vector
A feature vector is a numerical representation of an object (e.g., an image, a house, a customer). It is used to store features for machine learning(ML) models.
ANALOGY
Think of a feature vector like a passport: it contains structured details about a person (name, age, nationality). Similarly, a feature vector contains structured details about an object.
EXAMPLE 1: Feature vector for a house
To create a feature vector, we select relevant features of an object and organize them into a vector.
The feature vector is:
![Rendered by QuickLaTeX.com \[ \hspace{5mm} \fbox{ \begin{array}{c} \vspace{5mm} \\ \textbf{\large x} = \begin{bmatrix} 120 & 3 & 5 & 8.5 \end{bmatrix} \\ \vspace{5mm} \end{array} } \hspace{5mm} \]](https://www.reinforcementlearningpath.com/wp-content/ql-cache/quicklatex.com-dc1d127d8b15e9d55f1013f8a329674e_l3.png)
Where:
- 120m²
- 3 – rooms
- 5 – years old
- 8.5 – location score
2. State Vector (Observation Vector)
A state vector represents the current state of an agent at a specific time t in a Reinforcement Learning (RL) environment. This time dependence is crucial because the state can change depending on the agent’s actions and the dynamics of the environment. We can construct a state vector by extracting the relevant information of an agent and storing it in a vector.
ANALOGY
Imagine a pilot’s dashboard in a plane that shows altitude, speed, and direction. The state vector is like that dashboard—it tells the RL agent everything about the current environment.
EXAMPLE 2: Construct a state vector for a self-driving car
A self-driving car at time t might have this state vector:
![Rendered by QuickLaTeX.com \[ \hspace{5mm} \fbox{ \begin{array}{c} \vspace{5mm} \\ \textbf{\large s}_{\mathbf{t}} = \begin{bmatrix} 80 & 0.1 & 15 \end{bmatrix} \\ \vspace{5mm} \end{array} } \hspace{5mm} \]](https://www.reinforcementlearningpath.com/wp-content/ql-cache/quicklatex.com-42a7bd8a37f971295b9da089d36ec062_l3.png)
Where:
- 80 – km/h speed
- 0.1 – radians steering angle
- 15m – distance to the nearest car
3. Action Vector
The action vector represents the action choices available to an RL agent. This type of vector is formed by mapping each action to a number or an encoding in the vector. The actions can be discrete or continuous, this resulting in having a vector with different shapes depending on the algorithm.
In RL, actions are taken sequentially in time, in this way each action depends on the current state and can influence future states. The notation at means that the action is taken at time step t during the training or execution of the agent. This is essential in the Markovian Decision Process (MDP), where the state at time t+1 depends on the action at taken in state st.
ANALOGY
Think of an RC car remote:
- Buttons (discrete actions: forward, left, right, stop).
- Joystick (continuous actions: control speed and angle).
EXAMPLE 3: Discrete and continuous actions
In discrete action spaces, actions are encoded in this way:
![Rendered by QuickLaTeX.com \[ \hspace{5mm} \fbox{ \begin{array}{c} \vspace{5mm} \\ \textbf{\large a}_{\mathbf{t}} = \begin{bmatrix} 1 & 0 & 0 \end{bmatrix} \\ \vspace{5mm} \end{array} } \hspace{5mm} \]](https://www.reinforcementlearningpath.com/wp-content/ql-cache/quicklatex.com-97df4fc1b69e0a2999488f8ec0680906_l3.png)
In continuous spaces, actions are real-valued vectors:
![Rendered by QuickLaTeX.com \[ \hspace{5mm} \fbox{ \begin{array}{c} \vspace{5mm} \\ \textbf{\large a}_{\mathbf{t}} = \begin{bmatrix} 0.5 & −0.2 \end{bmatrix} \\ \vspace{5mm} \end{array} } \hspace{5mm} \]](https://www.reinforcementlearningpath.com/wp-content/ql-cache/quicklatex.com-401da3870b15a58a8be4403d077bafa8_l3.png)
A robotic arm’s action vector may be represented like this for grip strength, rotation angle, and extension length:
![Rendered by QuickLaTeX.com \[ \hspace{5mm} \fbox{ \begin{array}{c} \vspace{5mm} \\ \textbf{\large a}_{\mathbf{t}} = \begin{bmatrix} 0.7 & 15 & 10 \end{bmatrix} \\ \vspace{5mm} \end{array} } \hspace{5mm} \]](https://www.reinforcementlearningpath.com/wp-content/ql-cache/quicklatex.com-2ebf02c982dc1860e5e59f11d71560ee_l3.png)
Where:
- 70% – grip strength
- 15 – degree rotation
- 10 cm – extension
4. Policy Vector
A policy vector represents the probability of taking each action in a given state. This type of vector is formed by outputting a vector of probabilities for each action.
A policy vector can take discrete and continuous action spaces.
In discrete action spaces, the vector contains probability values for each action, ensuring that they sum to 1.
In continuous action spaces, the policy vector can represent parameters (such as means and variances) of a probability distribution, commonly Gaussian. In this case, the policy is used to sample actions according to these probabilities.
ANALOGY
Think of a roulette wheel: each slice represents an action, and the larger the slice, the more likely the agent is to choose that action.
EXAMPLE 4: Deciding between actions
An AI in a game deciding between three actions:
![Rendered by QuickLaTeX.com \[ \hspace{5mm} \fbox{ \makebox{ \begin{array}{c} \vspace{5mm} \\ % Adaugă spațiu deasupra \pi(s) = \begin{bmatrix} 0.5 & 0.4 & 0.1 \end{bmatrix} \\ \vspace{5mm} % Adaugă spațiu sub vector \end{array} } } \hspace{5mm} \]](https://www.reinforcementlearningpath.com/wp-content/ql-cache/quicklatex.com-a2078b337035c09bfc6c4a795233f89c_l3.png)
Where:
- 50% – probability to attack
- 40% – to defend
- 10% – to retreat
5. Weight Vector
The parameters of a machine learning model are stored in a weight vector, and each weight corresponds to a feature. In linear regression, the weight vector multiplies the feature vector to make predictions.
ANALOGY
Think of a recipe – each ingredient has a weight that determines how much to use. In ML, the weight of each feature influences the prediction.
EXAMPLE 5: A weight vector for a house price prediction model
The feature vector used for predicting house prices is:
![Rendered by QuickLaTeX.com \[ \hspace{5mm} \fbox{ \begin{array}{c} \vspace{5mm} \\ \textbf{\large x} = \begin{bmatrix} 120 & 3 & 5 & 8.5 \end{bmatrix} \\ \vspace{5mm} \end{array} } \hspace{5mm} \]](https://www.reinforcementlearningpath.com/wp-content/ql-cache/quicklatex.com-dc1d127d8b15e9d55f1013f8a329674e_l3.png)
The weight vector that correspond to the feature vector is:
![Rendered by QuickLaTeX.com \[ \hspace{5mm} \fbox{ \begin{array}{c} \vspace{5mm} \\ \textbf{\large w} = \begin{bmatrix} 0.8 & 1.5 & -0.3 & 2.1 \end{bmatrix} \\ \vspace{5mm} \end{array} } \hspace{5mm} \]](https://www.reinforcementlearningpath.com/wp-content/ql-cache/quicklatex.com-6a1d9e8595c3c5da089b858067bca845_l3.png)
The general formula for a linear regression model is:
![Rendered by QuickLaTeX.com \[ \hspace{5mm} \fbox{ \begin{array}{c} \vspace{5mm} \\ \hat{y} = w_1 x_1 + w_2 x_2 + w_3 x_3 + w_4 x_4 \\ \vspace{5mm} \end{array} } \hspace{5mm} \]](https://www.reinforcementlearningpath.com/wp-content/ql-cache/quicklatex.com-aecf45531d6de0dfb659df4b805b11fa_l3.png)
Where:
- ŷ – the predicted price
- xi – the features (house attributes)
- wi – the corresponding weights
The predicted price is calculated using the dot product (element-wise multiplication and summation) of the weight vector and the feature vector.
![Rendered by QuickLaTeX.com \[ \hspace{5mm} \fbox{ \begin{array}{c} \vspace{5mm} \\ \text{price} = 0.8 \times 120 + 1.5 \times 3 + (-0.3) \times 5 + 2.1 \times 8.5 = 96 + 4.5 - 1.5 + 17.85 = 116.85 \\ \vspace{5mm} \end{array} } \hspace{5mm} \]](https://www.reinforcementlearningpath.com/wp-content/ql-cache/quicklatex.com-79a469b7181d113e97b5f017ac98a2f0_l3.png)
6. Word Embedding Vector
A word embedding vector is a representation of a word used in Natural Language Processing (NLP). This vector is formed by mapping words to a high-dimensional space where similar words are closer together.
ANALOGY
Imagine a dictionary where words that mean similar things are stored closer together. Word embeddings work in a similar way, but in mathematical space.
EXAMPLE 6: A simplified word vector for “happy”
![Rendered by QuickLaTeX.com \[ \hspace{5mm} \fbox{ \begin{array}{c} \vspace{5mm} \\ \text{happy} = \begin{bmatrix} 0.7 & 0.9 & 0.1 \end{bmatrix} \\ \vspace{5mm} \end{array} } \hspace{5mm} \]](https://www.reinforcementlearningpath.com/wp-content/ql-cache/quicklatex.com-12dbbd637d5e147f5fd3a79e55d97e41_l3.png)
A similar word like “joyful” would have a close vector.
7. Latent Vector (Hidden Representation)
A latent vector is a compressed internal representation of data, used in deep learning models. It is formed using autoencoders to compress high-dimensional data into a small vector.
ANALOGY
Think of a ZIP file – the compression reduces the file size while keeping the important information.
EXAMPLE 7: Compressing an image
A 256×256 pixel image can be compressed into a 64-dimensional latent vector without losing the important data.
8. Gradient Vector
A gradient vector represents the direction and magnitude of change in model parameters. This type of vector is formed by computing the derivative of the loss function with respect to each model parameter. It is used in gradient descent to optimize ML models.
ANALOGY
Think of hiking downhill. The gradient tells you which way is the steepest descent to reach the lowest point (optimal model weights).
EXAMPLE 8: How a gradient vector may look in a neural network
In training a neural network, the gradient vector might look like:
![Rendered by QuickLaTeX.com \[ \hspace{5mm} \fbox{ \begin{array}{c} \vspace{5mm} \\ \nabla w = \begin{bmatrix} -0.02 & 0.1 & -0.05 \end{bmatrix} \\ \vspace{5mm} \end{array} } \hspace{5mm} \]](https://www.reinforcementlearningpath.com/wp-content/ql-cache/quicklatex.com-3a47df1a47ec39799c81e7d23e9ee2e1_l3.png)
This tells the model how to adjust weights in the next step.
Limitations of Vectors
All eight types of vectors mentioned above are proof that vectors are a powerful tool in RL, but they also come with several limitations.
Below is a list of the most important drawbacks:
1. High-Dimensionality Problem
When dealing with complex environments, state and action vectors can become extremely large. This is a problem because large vectors require more memory and longer processing times.
EXAMPLE 1: A robot with 100 sensors might have a 100-dimensional state vector, making learning difficult.
EXAMPLE 2: In autonomous driving, the state vector may include pixel data from a camera, leading to millions of features. Standard RL struggles with such high-dimensional vectors.
2. Loss of Structure & Relationships
A vector represents data as an organized list of numbers. However, real-world data often has hierarchical structures (e.g., images, graphs). Traditional vector-based RL ignores spatial and temporal relationships. A Convolutional Neural Network (CNN) is often needed to extract meaningful patterns.
EXAMPLE 3: A game board (chess, Go) has a spatial structure that is not well captured by a vector.
EXAMPLE 4: In image-based RL, an agent trained on raw pixel vectors will struggle to recognize objects. As a solution, you have to use CNNs to process images into structured features before converting them into vectors.
3. Discrete vs. Continuous Action Spaces
Some RL problems involve continuous actions (e.g., adjusting a robot’s joint angles). Representing continuous actions as discrete vectors can lead to poor performance. Discretization of actions leads to loss of precision. Fine motor control in robotics needs continuous action vectors.
EXAMPLE 5: A robotic arm needs to rotate by precise degrees (e.g., 3.56°), but a discrete vector can only store fixed angles (e.g., 3° or 4°).
4. Fixed-Length Limitation
RL tasks often involve changing environments, where the number of relevant features may increase or decrease over time. Fixed-length vectors cannot handle variable input sizes efficiently. As a solution, instead of using fixed-length vectors, are used Graph-based RL (e.g., Graph Neural Networks).
EXAMPLE 6: In multi-agent RL, the number of agents may change dynamically, but the state vector remains fixed-size, making adaptation difficult.
5. Poor Generalization to Unseen States
Vectors store exact numerical values, but they don’t inherently capture relationships or similarities between states. If an agent encounters a new state it has never seen before, the vector representation provides no context. This is a problem since RL requires generalization to new environments. Standard vector representations lead to overfitting to specific training conditions.
EXAMPLE 7: A robot trained in a simulation may perform poorly in the real world because vector-based states don’t adapt well to new inputs. As a solution, it uses the state embeddings that capture higher-level similarities between different states.
6. Computational Complexity in Large Environments
In high-dimensional spaces, vector-based calculations (e.g., distance, gradient updates) become expensive. This problem increases the training time significantly for large state/action vectors. Many RL tasks require millions of training iterations, making computations costly. To solve this problem, the practice is to use dimensionality reduction techniques (PCA, autoencoders) to reduce vector size. Another solution is to optimize RL models using parallel computing.
EXAMPLE 8: AlphaGo used deep learning + RL, but needed thousands of GPUs to train efficiently.
References:
- Sutton, R. S., & Barto, A. G. (2018). Reinforcement Learning: An Introduction. MIT Press.
- Gilbert Strang (2006). Linear Algebra and Its Applications, 4th Edition. Cengage Learning.
- Ian Goodfellow & Yoshua Bengio & Aaron Courville (2016). Deep Learning (Adaptive Computation and Machine Learning series). MIT Press.
- Marc Peter Deisenroth (2020). Mathematics for Machine Learning 1st Edition. Cambridge University Press.
- “Welcome to Spinning Up in Deep RL!” – Open AI
- “Graph neural networks: A review of methods and applications” – Science Direct
The First Principle << Previous | Next >> Derivates