Feature Vector, State Vector, Action Vector, Policy Vector, Weight Vector and Example

This page was last edited on 04 November 2025

Before we start using Reinforcement Learning (RL) to train an agent, we must first understand the fundamentals. Vectors are one of the most important concepts. Many AI techniques, including RL, would not work without vectors.

Vectors provide a solid foundation for data representation and training. In Artificial Intelligence, we use vectors to represent all types of data — images, text, sound. Complex data becomes a simple list of numbers that models can process.

For example, if we describe a drone’s position by saying “it’s in front of me,” it’s not enough. It’s vague. But if we say “10 meters forward, 5 meters right, and 2 meters up,” it becomes clear. That set of numbers (10, 5, 2) is a vector.

The 3D graph below shows exactly this. The drone’s position is a vector starting from the origin (0, 0, 0) and pointing to (x, y, z). The red dot shows the position. The blue arrow shows the direction and size of the vector.

Drone Position as a 3D Vector
Drone Position as a 3D Vector

In Reinforcement Learning, multiple types of vectors are used to represent different aspects of an agent’s state, actions, and learning process. Below are the types of vectors commonly used in RL.

1. Feature Vector

A feature vector is a numerical representation of an object (e.g., an image, a house, a customer). It is used to store features for machine learning(ML) models.

ANALOGY

Think of a feature vector like a passport: it contains structured details about a person (name, age, nationality). Similarly, a feature vector contains structured details about an object.

To create a feature vector, we select relevant features of an object and organize them into a vector.

The feature vector is:

     \[ \hspace{5mm} \fbox{     \begin{array}{c}         \vspace{5mm} \\          \textbf{\large x} = \begin{bmatrix} 120 & 3 & 5 & 8.5 \end{bmatrix} \\           \vspace{5mm}      \end{array} }  \hspace{5mm} \]

Where:

  • 120m²
  • 3 – rooms
  • 5 – years old
  • 8.5 – location score

2. State Vector (Observation Vector)

A state vector represents the current state of an agent at a specific time t in a Reinforcement Learning (RL) environment. This time dependence is crucial because the state can change depending on the agent’s actions and the dynamics of the environment. We can construct a state vector by extracting the relevant information of an agent and storing it in a vector.

ANALOGY

Imagine a pilot’s dashboard in a plane that shows altitude, speed, and direction. The state vector is like that dashboard—it tells the RL agent everything about the current environment.

A self-driving car at time t might have this state vector:

     \[ \hspace{5mm} \fbox{     \begin{array}{c}         \vspace{5mm} \\          \textbf{\large s}_{\mathbf{t}} = \begin{bmatrix} 80 & 0.1 & 15 \end{bmatrix} \\           \vspace{5mm}      \end{array} }  \hspace{5mm} \]

Where:

  • 80 – km/h speed
  • 0.1 – radians steering angle
  • 15m – distance to the nearest car

3. Action Vector

The action vector represents the action choices available to an RL agent. This type of vector is formed by mapping each action to a number or an encoding in the vector. The actions can be discrete or continuous, this resulting in having a vector with different shapes depending on the algorithm.

In RL, actions are taken sequentially in time, in this way each action depends on the current state and can influence future states. The notation at means that the action is taken at time step t during the training or execution of the agent. This is essential in the Markovian Decision Process (MDP), where the state at time t+1 depends on the action at taken in state st.

ANALOGY

Think of an RC car remote:

  • Buttons (discrete actions: forward, left, right, stop).
  • Joystick (continuous actions: control speed and angle).

In discrete action spaces, actions are encoded in this way:

     \[ \hspace{5mm} \fbox{     \begin{array}{c}         \vspace{5mm} \\          \textbf{\large a}_{\mathbf{t}} = \begin{bmatrix} 1 & 0 & 0 \end{bmatrix} \\           \vspace{5mm}      \end{array} }  \hspace{5mm} \]

In continuous spaces, actions are real-valued vectors:

     \[ \hspace{5mm} \fbox{     \begin{array}{c}         \vspace{5mm} \\          \textbf{\large a}_{\mathbf{t}} = \begin{bmatrix} 0.5 & −0.2 \end{bmatrix} \\           \vspace{5mm}      \end{array} }  \hspace{5mm} \]

A robotic arm’s action vector may be represented like this for grip strength, rotation angle, and extension length:

     \[ \hspace{5mm} \fbox{     \begin{array}{c}         \vspace{5mm} \\          \textbf{\large a}_{\mathbf{t}} = \begin{bmatrix} 0.7 & 15 & 10 \end{bmatrix} \\           \vspace{5mm}      \end{array} }  \hspace{5mm} \]

Where:

  • 70% – grip strength
  • 15 – degree rotation
  • 10 cm – extension

4. Policy Vector

A policy vector represents the probability of taking each action in a given state. This type of vector is formed by outputting a vector of probabilities for each action.

A policy vector can take discrete and continuous action spaces.

In discrete action spaces, the vector contains probability values for each action, ensuring that they sum to 1.

In continuous action spaces, the policy vector can represent parameters (such as means and variances) of a probability distribution, commonly Gaussian. In this case, the policy is used to sample actions according to these probabilities.

ANALOGY

Think of a roulette wheel: each slice represents an action, and the larger the slice, the more likely the agent is to choose that action.

An AI in a game deciding between three actions:

     \[ \hspace{5mm} \fbox{     \makebox{         \begin{array}{c}             \vspace{5mm} \\  % Adaugă spațiu deasupra             \pi(s) =             \begin{bmatrix}                  0.5 & 0.4 & 0.1              \end{bmatrix} \\               \vspace{5mm}  % Adaugă spațiu sub vector         \end{array}     } } \hspace{5mm} \]

Where:

  • 50% – probability to attack
  • 40% – to defend
  • 10% – to retreat

5. Weight Vector

The parameters of a machine learning model are stored in a weight vector, and each weight corresponds to a feature. In linear regression, the weight vector multiplies the feature vector to make predictions.

ANALOGY

Think of a recipe – each ingredient has a weight that determines how much to use. In ML, the weight of each feature influences the prediction.

The feature vector used for predicting house prices is:

     \[ \hspace{5mm} \fbox{     \begin{array}{c}         \vspace{5mm} \\          \textbf{\large x} = \begin{bmatrix} 120 & 3 & 5 & 8.5 \end{bmatrix} \\           \vspace{5mm}      \end{array} }  \hspace{5mm} \]

The weight vector that correspond to the feature vector is:

     \[ \hspace{5mm} \fbox{     \begin{array}{c}         \vspace{5mm} \\          \textbf{\large w} = \begin{bmatrix} 0.8 & 1.5 & -0.3 & 2.1 \end{bmatrix} \\           \vspace{5mm}      \end{array} }  \hspace{5mm} \]

The general formula for a linear regression model is:

     \[ \hspace{5mm} \fbox{ \begin{array}{c} \vspace{5mm} \\ \hat{y} = w_1 x_1 + w_2 x_2 + w_3 x_3 + w_4 x_4 \\ \vspace{5mm} \end{array} } \hspace{5mm} \]

Where:

  • ŷ – the predicted price
  • xi – the features (house attributes)
  • wi – the corresponding weights

The predicted price is calculated using the dot product (element-wise multiplication and summation) of the weight vector and the feature vector. 

     \[ \hspace{5mm} \fbox{ \begin{array}{c} \vspace{5mm} \\ \text{price} = 0.8 \times 120 + 1.5 \times 3 + (-0.3) \times 5 + 2.1 \times 8.5 = 96 + 4.5 - 1.5 + 17.85 = 116.85 \\ \vspace{5mm} \end{array} } \hspace{5mm} \]


6. Word Embedding Vector

A word embedding vector is a representation of a word used in Natural Language Processing (NLP). This vector is formed by mapping words to a high-dimensional space where similar words are closer together.

ANALOGY

Imagine a dictionary where words that mean similar things are stored closer together. Word embeddings work in a similar way, but in mathematical space.

     \[ \hspace{5mm} \fbox{ \begin{array}{c} \vspace{5mm} \\ \text{happy} = \begin{bmatrix} 0.7 & 0.9 & 0.1 \end{bmatrix} \\ \vspace{5mm} \end{array} } \hspace{5mm} \]

A similar word like “joyful” would have a close vector.


7. Latent Vector (Hidden Representation)

A latent vector is a compressed internal representation of data, used in deep learning models. It is formed using autoencoders to compress high-dimensional data into a small vector.

ANALOGY

Think of a ZIP file – the compression reduces the file size while keeping the important information.

A 256×256 pixel image can be compressed into a 64-dimensional latent vector without losing the important data.


8. Gradient Vector

A gradient vector represents the direction and magnitude of change in model parameters. This type of vector is formed by computing the derivative of the loss function with respect to each model parameter. It is used in gradient descent to optimize ML models.

ANALOGY

Think of hiking downhill. The gradient tells you which way is the steepest descent to reach the lowest point (optimal model weights).

In training a neural network, the gradient vector might look like:

     \[ \hspace{5mm} \fbox{ \begin{array}{c} \vspace{5mm} \\ \nabla w = \begin{bmatrix} -0.02 & 0.1 & -0.05 \end{bmatrix} \\ \vspace{5mm} \end{array} } \hspace{5mm} \]

This tells the model how to adjust weights in the next step.


Limitations of Vectors

All eight types of vectors mentioned above are proof that vectors are a powerful tool in RL, but they also come with several limitations.

Below is a list of the most important drawbacks:

1. High-Dimensionality Problem

When dealing with complex environments, state and action vectors can become extremely large. This is a problem because large vectors require more memory and longer processing times.

EXAMPLE 1: A robot with 100 sensors might have a 100-dimensional state vector, making learning difficult.

EXAMPLE 2: In autonomous driving, the state vector may include pixel data from a camera, leading to millions of features. Standard RL struggles with such high-dimensional vectors.

2. Loss of Structure & Relationships

A vector represents data as an organized list of numbers. However, real-world data often has hierarchical structures (e.g., images, graphs). Traditional vector-based RL ignores spatial and temporal relationships. A Convolutional Neural Network (CNN) is often needed to extract meaningful patterns.

EXAMPLE 3: A game board (chess, Go) has a spatial structure that is not well captured by a vector.

EXAMPLE 4: In image-based RL, an agent trained on raw pixel vectors will struggle to recognize objects. As a solution, you have to use CNNs to process images into structured features before converting them into vectors.

3. Discrete vs. Continuous Action Spaces

Some RL problems involve continuous actions (e.g., adjusting a robot’s joint angles). Representing continuous actions as discrete vectors can lead to poor performance. Discretization of actions leads to loss of precision. Fine motor control in robotics needs continuous action vectors.

EXAMPLE 5: A robotic arm needs to rotate by precise degrees (e.g., 3.56°), but a discrete vector can only store fixed angles (e.g., 3° or 4°).

4. Fixed-Length Limitation

RL tasks often involve changing environments, where the number of relevant features may increase or decrease over time. Fixed-length vectors cannot handle variable input sizes efficiently. As a solution, instead of using fixed-length vectors, are used Graph-based RL (e.g., Graph Neural Networks).

EXAMPLE 6: In multi-agent RL, the number of agents may change dynamically, but the state vector remains fixed-size, making adaptation difficult.

5. Poor Generalization to Unseen States

Vectors store exact numerical values, but they don’t inherently capture relationships or similarities between states. If an agent encounters a new state it has never seen before, the vector representation provides no context. This is a problem since RL requires generalization to new environments. Standard vector representations lead to overfitting to specific training conditions.

EXAMPLE 7: A robot trained in a simulation may perform poorly in the real world because vector-based states don’t adapt well to new inputs. As a solution, it uses the state embeddings that capture higher-level similarities between different states.

6. Computational Complexity in Large Environments

In high-dimensional spaces, vector-based calculations (e.g., distance, gradient updates) become expensive. This problem increases the training time significantly for large state/action vectors. Many RL tasks require millions of training iterations, making computations costly. To solve this problem, the practice is to use dimensionality reduction techniques (PCA, autoencoders) to reduce vector size. Another solution is to optimize RL models using parallel computing.

EXAMPLE 8: AlphaGo used deep learning + RL, but needed thousands of GPUs to train efficiently.


References:


The First Principle << Previous | Next >> Derivates