This page was last edited on 05 November 2025
In the previous page of the tutorial we covered how vectors are used to represent the agent’s states and actions. In this section we will explore derivatives.
Derivatives are important in Reinforcement Learning because they help to accelerate the learning process, adjust the policy parameters so that the agent takes better decisions, and ensure consistent learning updates.
The role of derivatives in RL is:
- Policy Optimization: Derivatives adjust the agent’s actions to maximize rewards.
- Value Function Approximation: Derivatives refine the predictions about future rewards.
- Backpropagation: In the backpropagation process, the derivatives are used to update neural network weights during training.
Derivatives solve some fundamental problems in RL. This includes:
- Optimization Challenges: Derivatives tell us how to change the actions to get better outcomes.
- Non-Linear Dynamics: Derivatives help to navigate complex environments where outcomes are not straightforward.
- Function Approximation: Derivatives help us estimate the value functions accurately.
To explain this concept in a simple way, we can imagine that a derivative measures the rate of change of a function’s output(s) with respect to the change in its input(s). If you have a function f(x), the derivative tells you how fast that function increases or decreases depending on the value of x.
To illustrate this concept further, let’s explore a specific example of how a derivative is used by applying the equation of position to an object in accelerated motion.
STEP 1: The definition of the derivative
The derivative of a function x(t) is defined as follows:
![Rendered by QuickLaTeX.com \[ \hspace{5mm} \fbox{ \begin{array}{c} \vspace{5mm} \\ \displaystyle \frac{dx}{dt} = \lim_{\Delta t \to 0} \frac{x(t + \Delta t) - x(t)}{\Delta t} \\ \vspace{5mm} \end{array} } \hspace{5mm} \]](https://www.reinforcementlearningpath.com/wp-content/ql-cache/quicklatex.com-b552b061ececc3340b5c491da3b9a92b_l3.png)
Where:
- x(t) – A function that depends on time t (e.g., position of an object at time t)
- dx – An infinitesimally small change in x
- dt – An infinitesimally small change in time t
- Δt – A finite small change in time, used in the difference quotient before taking the limit
- lim Δt→0 – The limit operation that makes the difference quotient approach an exact derivative by reducing Δt to an infinitesimally small value
This is the rate of change of position over time. It tells us how fast x(t) is changing as time t progresses.
STEP 2: The equation of position of an object in accelerated motion
It’s one of the equations of motion in physics, and it describes the position of an object as a function of time, assuming that the object is moving with constant acceleration.
![Rendered by QuickLaTeX.com \[ \hspace{5mm} \fbox{ \begin{array}{c} \vspace{5mm} \\ \displaystyle x(t) = \frac{1}{2} at^2 + v_0 t + x_0 \\ \vspace{5mm} \end{array} } \hspace{5mm} \]](https://www.reinforcementlearningpath.com/wp-content/ql-cache/quicklatex.com-0a4942893c798498aaedb5cdae5f65f4_l3.png)
Where:
- x(t) – Position of an object as a function of time t
- t – Time variable
- a – Constant acceleration of the object
- v0 – Initial velocity of the object (velocity at t=0)
- x0 – Initial position of the object (position at t=0)
We use specific values to make the calculations clear:
- acceleration: a = 6m/s2
- velocity at time t=0: v0 = 2m/s
- initial position: x0 = 0m
The result is the function:
![Rendered by QuickLaTeX.com \[ \hspace{5mm} \fbox{ \begin{array}{c} \vspace{5mm} \\ \displaystyle x(t) = 3t^2 + 2t \\ \vspace{5mm} \end{array} } \hspace{5mm} \]](https://www.reinforcementlearningpath.com/wp-content/ql-cache/quicklatex.com-668a6729d13790291d5b8d54bcadf71f_l3.png)
STEP 3: We apply the definition of the derivative
We calculate the derivative using the basic formula:
![Rendered by QuickLaTeX.com \[ \hspace{5mm} \fbox{ \begin{array}{c} \vspace{5mm} \\ \displaystyle \frac{dx}{dt} = \lim_{\Delta t \to 0} \frac{x(t + \Delta t) - x(t)}{\Delta t} \\ \vspace{5mm} \end{array} } \hspace{5mm} \]](https://www.reinforcementlearningpath.com/wp-content/ql-cache/quicklatex.com-b552b061ececc3340b5c491da3b9a92b_l3.png)
We substitute the function: x(t) = 3t2 + 2t
![Rendered by QuickLaTeX.com \[ \hspace{5mm} \fbox{ \begin{array}{c} \vspace{5mm} \\ \displaystyle \frac{dx}{dt} = \lim_{\Delta t \to 0} \frac{3(t + \Delta t)^2 + 2(t + \Delta t) - (3t^2 + 2t)}{\Delta t} \\ \vspace{5mm} \end{array} } \hspace{5mm} \]](https://www.reinforcementlearningpath.com/wp-content/ql-cache/quicklatex.com-079028fce643bb50f888144f1400de8d_l3.png)
STEP 4: Expand the terms in the numerator
Calculate (t + Δt)2
![Rendered by QuickLaTeX.com \[ \hspace{5mm} \fbox{ \begin{array}{c} \vspace{5mm} \\ \displaystyle (t + \Delta t)^2 = t^2 + 2t\Delta t + (\Delta t)^2 \\ \vspace{5mm} \end{array} } \hspace{5mm} \]](https://www.reinforcementlearningpath.com/wp-content/ql-cache/quicklatex.com-8d870bc9022ce406f5307e301bc51e1b_l3.png)
We substitute into the equation:
![Rendered by QuickLaTeX.com \[ \hspace{5mm} \fbox{ \begin{array}{c} \vspace{5mm} \\ \displaystyle \frac{dx}{dt} = \lim_{\Delta t \to 0} \frac{3(t^2 + 2t\Delta t + (\Delta t)^2) + 2t + 2\Delta t - (3t^2 + 2t)}{\Delta t} \\ \vspace{5mm} \end{array} } \hspace{5mm} \]](https://www.reinforcementlearningpath.com/wp-content/ql-cache/quicklatex.com-1280c4d1856f7270ef6a845e17945567_l3.png)
We distribute the factor 3:
![Rendered by QuickLaTeX.com \[ \hspace{5mm} \fbox{ \begin{array}{c} \vspace{5mm} \\ \displaystyle \lim_{\Delta t \to 0} \frac{(3t^2 + 6t\Delta t + 3(\Delta t)^2) + 2t + 2\Delta t - 3t^2 - 2t}{\Delta t} \\ \vspace{5mm} \end{array} } \hspace{5mm} \]](https://www.reinforcementlearningpath.com/wp-content/ql-cache/quicklatex.com-efcee01307e8d335c810b927b1c959a1_l3.png)
We observe that 3t2 and -3t2, as well as 2t and -2t is eliminated from the equation. The result is:
![Rendered by QuickLaTeX.com \[ \hspace{5mm} \fbox{ \begin{array}{c} \vspace{5mm} \\ \displaystyle \lim_{\Delta t \to 0} \frac{6t\Delta t + 3(\Delta t)^2 + 2\Delta t}{\Delta t} \\ \vspace{5mm} \end{array} } \hspace{5mm} \]](https://www.reinforcementlearningpath.com/wp-content/ql-cache/quicklatex.com-949aebab8c4af9d882eaf812e1ea1bbf_l3.png)
We can factor Δt out of all terms:
![Rendered by QuickLaTeX.com \[ \hspace{5mm} \fbox{ \begin{array}{c} \vspace{5mm} \\ \displaystyle \lim_{\Delta t \to 0} \frac{\Delta t(6t + 3\Delta t + 2)}{\Delta t} \\ \vspace{5mm} \end{array} } \hspace{5mm} \]](https://www.reinforcementlearningpath.com/wp-content/ql-cache/quicklatex.com-42b3de10970cc3597ca553983976d9a4_l3.png)
Now we simplify Δt:
![Rendered by QuickLaTeX.com \[ \hspace{5mm} \fbox{ \begin{array}{c} \vspace{5mm} \\ \displaystyle \lim_{\Delta t \to 0} (6t + 3\Delta t + 2) \\ \vspace{5mm} \end{array} } \hspace{5mm} \]](https://www.reinforcementlearningpath.com/wp-content/ql-cache/quicklatex.com-180c5b711ed6a5733661ea33c8decdb5_l3.png)
We apply the limit Δt→0. When Δt approaches zero, the 3Δt term disappears:
![Rendered by QuickLaTeX.com \[ \hspace{5mm} \fbox{ \begin{array}{c} \vspace{5mm} \\ \displaystyle v(t) = 6t + 2 \\ \vspace{5mm} \end{array} } \hspace{5mm} \]](https://www.reinforcementlearningpath.com/wp-content/ql-cache/quicklatex.com-d67cb6b053c2c6174a6abd22aaa4ad2f_l3.png)
STEP 5: Demonstration
To demonstrate the role of the derivative and how to interpret the results, we will do 5 iterations of manual calculation.
We will use the equations of motion to calculate position, velocity, and acceleration at 5 different times and interpret what these results mean.
Position equation: x(t) = 3t2 + 2t
Velocity equation (derivative of position): v(t) = 6t + 2
If we derive once more, we get the derivate of the acceleration, which is:
![Rendered by QuickLaTeX.com \[ \hspace{5mm} \fbox{ \begin{array}{c} \vspace{5mm} \\ \displaystyle a(t) = \frac{d}{dt} v(t) = \frac{d}{dt} (6t + 2) = 6 \\ \vspace{5mm} \end{array} } \hspace{5mm} \]](https://www.reinforcementlearningpath.com/wp-content/ql-cache/quicklatex.com-fcfff4865e6e2cd1f8a62e531ac5d4dd_l3.png)
We will calculate the position, velocity and acceleration at t= 0, 1, 2, 3, 4 seconds.
ITERATION 1
t(s)=0
x(t)=3t2+2t=3(0)2+2(0)=0
v(t)=6t+2=6(0)+2=2
a(t)=6
ITERATION 2
t(s)=1
x(t)=3t2+2t=3(1)2+2(1)=3+2=5
v(t)=6t+2=6(1)+2=8
a(t)=6
ITERATION 3
t(s)=2
x(t)=3t2+2t=3(2)2+2(2)=12+4=16
v(t)=6t+2=6(2)+2=14
a(t)=6
ITERATION 4
t(s)=3
x(t)=3t2+2t=3(3)2+2(3)=27+6=33
v(t)=6t+2=6(3)+2=20
a(t)=6
ITERATION 5
t(s)=4
x(t)=3t2+2t=3(4)2+2(4)=48+8=56
v(t)=6t+2=6(4)+2=26
a(t)=6
In the table below, you can find an overview of the manual calculations for all five iterations.
| t(s) | x(t) | v(t) | a(t) |
| 0 | 0 | 2 | 6 |
| 1 | 5 | 8 | 6 |
| 2 | 16 | 14 | 6 |
| 3 | 33 | 20 | 6 |
| 4 | 56 | 26 | 6 |
We can observe the following aspects:
- The position x(t) increases at an accelerated rate (since we have t2 in the equation). Position x(t) tells us where the object is at a certain time.
- The velocity v(t) increases linearly (since the derivative of position is a function of degree 1. Velocity v(t) (the derivative of position) tells us how fast the object is moving.
- The acceleration is constant at 6 m/s², which shows that the motion is uniformly accelerated. Acceleration a(t) (the derivative of velocity)tells us how fast the velocity is changing.
In the bellow graph, the tangent shows us how quickly the position is changing at that moment. In this case, at moment t=2.

The tangent shows the object’s speed, direction at that moment, and what would happen if the object continued moving at the same speed.
The tangent (red line) changes because the speed changes. If the object in motion were moving at a constant speed, the tangent would always remain at the same slope. But since the object accelerates (6 m/s²), the tangent becomes steeper—indicating that its speed is increasing.
How will derivatives help us in RL?
For example, in Reinforcement Learning (RL), a robot needs to learn how to move on its own. The tangent helps it understand how fast its position is changing so that it knows how to stop, accelerate, or avoid obstacles.
If the robot did not understand the tangent (speed), it would not be able to make correct decisions.
For instance, if it needs to stop before hitting a wall, it must know how fast it is moving and how to slow down.
In Reinforcement Learning there are several types of derivatives, because each has a specific role in optimizing and learning the agent. Each method has trade-offs between accuracy, efficiency and stability.
Types of Derivatives
In Reinforcement Learning (RL) there are several types of derivatives, as each has a specific role in optimizing and learning the agent.
1. Gradient
The gradient measures how a function of several variables changes in all possible directions.
The gradient is used to indicate the direction of the fastest growth of the function and to optimize models in machine learning and neural networks.
Mathematically, the gradient of a function f(x, y) is:
![Rendered by QuickLaTeX.com \[ \hspace{5mm} \fbox{ \begin{array}{c} \vspace{5mm} \\ \displaystyle \nabla f = \left( \frac{\partial f}{\partial x}, \frac{\partial f}{\partial y} \right) \\ \vspace{5mm} \end{array} } \hspace{5mm} \]](https://www.reinforcementlearningpath.com/wp-content/ql-cache/quicklatex.com-573a3c7e4c49e6144ce2e25c97f5efe9_l3.png)
Where:
- ∇f – The gradient of the function f. It is a vector indicating the direction and maximum rate of increase of the function.
- ∂f – Represents a small change in the function f, but only with respect to a specific variable.
- ∂x – It represents an infinitesimal change in x.
- ∂y – Similar to ∂x, but applied to the variable y.
ANALOGY
Imagine climbing a hill blindfolded. If you want to get to the top as quickly as possible, you have to choose the direction in which the slope is steepest. The gradient tells you exactly that direction!
If you were on a flat surface, the gradient would be zero, indicating that there is no slope to climb.
EXAMPLE 1: Simple gradient example
We have the simple function:
![Rendered by QuickLaTeX.com \[ \hspace{5mm} \fbox{ \begin{array}{c} \vspace{5mm} \\ \displaystyle f(x, y) = x^2 + y^2 \\ \vspace{5mm} \end{array} } \hspace{5mm} \]](https://www.reinforcementlearningpath.com/wp-content/ql-cache/quicklatex.com-cd495920c3387c43a96d8c9526602754_l3.png)
The gradient is:
![Rendered by QuickLaTeX.com \[ \hspace{5mm} \fbox{ \begin{array}{c} \vspace{5mm} \\ \displaystyle \nabla f = \left( \frac{\partial f}{\partial x}, \frac{\partial f}{\partial y} \right) = (2x, 2y) \\ \vspace{5mm} \end{array} } \hspace{5mm} \]](https://www.reinforcementlearningpath.com/wp-content/ql-cache/quicklatex.com-89d5c2b52cce520c7575e1ee52fde5cc_l3.png)
If we choose a point, for example (1, 2), the gradient becomes:
![Rendered by QuickLaTeX.com \[ \hspace{5mm} \fbox{ \begin{array}{c} \vspace{5mm} \\ \displaystyle \nabla f(1,2) = (2 \cdot 1, 2 \cdot 2) = (2,4) \\ \vspace{5mm} \end{array} } \hspace{5mm} \]](https://www.reinforcementlearningpath.com/wp-content/ql-cache/quicklatex.com-d6335e587b71d24579ae933d9efbfe49_l3.png)
The vector (2, 4) shows us the direction of fastest growth of the function.
In RL applications, the gradient is applied in cases such as:
- REINFORCE Algorithm to improve action selection.
- Deep Q-Networks (DQN) to minimize the Mean Squared Error (MSE) between target Q-values and predicted Q-values.
- Noisy Networks for Exploration, where gradients update the noise parameters to learn better exploration strategies.
2. Jacobian Matrix
The Jacobian Matrix works with multiple inputs and outputs simultaneously.
If we have a function with multiple variables, the Jacobian Matrix tells us how each input variable affects each output variable.
If we have a function f:Rn -> Rm, that is, a function that takes a vector of dimension n and returns a vector of dimension m, the Jacobian Matrix is defined as:
![Rendered by QuickLaTeX.com \[ \hspace{5mm} \fbox{ \begin{array}{c} \vspace{5mm} \\ \displaystyle J_f(x) = \begin{bmatrix} \frac{\partial f_1}{\partial x_1} & \frac{\partial f_1}{\partial x_2} & \cdots & \frac{\partial f_1}{\partial x_n} \\ \frac{\partial f_2}{\partial x_1} & \frac{\partial f_2}{\partial x_2} & \cdots & \frac{\partial f_2}{\partial x_n} \\ \vdots & \vdots & \ddots & \vdots \\ \frac{\partial f_m}{\partial x_1} & \frac{\partial f_m}{\partial x_2} & \cdots & \frac{\partial f_m}{\partial x_n} \end{bmatrix} \\ \vspace{5mm} \end{array} } \hspace{5mm} \]](https://www.reinforcementlearningpath.com/wp-content/ql-cache/quicklatex.com-1dafbdf4a6010ded7916a47d398d949c_l3.png)
ANALOGY
An analogy for the Jacobian Matrix is how the movement of the steering wheel influences the trajectory of the car.
- Input variables: Steering wheel rotation and speed.
- Output variables: Direction and position of the car.
If the steering wheel is turned more (higher derivative), the change in direction will be more pronounced. The Jacobian Matrix expresses how each small change in input affects the output.
EXAMPLE 2: Simple example of Jacobian Matrix
We have the following simple vector function:
![Rendered by QuickLaTeX.com \[ \hspace{5mm} \fbox{ \begin{array}{c} \vspace{5mm} \\ \displaystyle f(x, y) = \begin{bmatrix} x^2 + y \\ \sin(x) + y^2 \end{bmatrix} \\ \vspace{5mm} \end{array} } \hspace{5mm} \]](https://www.reinforcementlearningpath.com/wp-content/ql-cache/quicklatex.com-a53f1fd46dc2056833a6dcddde01e31d_l3.png)
The Jacobian Matrix is calculated by taking the partial derivatives:
![Rendered by QuickLaTeX.com \[ \hspace{5mm} \fbox{ \begin{array}{c} \vspace{5mm} \\ \displaystyle J_f(x, y) = \begin{bmatrix} \frac{\partial}{\partial x} (x^2 + y) & \frac{\partial}{\partial y} (x^2 + y) \\ \frac{\partial}{\partial x} (\sin(x) + y^2) & \frac{\partial}{\partial y} (\sin(x) + y^2) \end{bmatrix} \\ \vspace{5mm} \end{array} } \hspace{5mm} \]](https://www.reinforcementlearningpath.com/wp-content/ql-cache/quicklatex.com-f94407fe9ccedd926ea6ea9b637af3db_l3.png)
Calculating each derivative:
![Rendered by QuickLaTeX.com \[ \hspace{5mm} \fbox{ \begin{array}{c} \vspace{5mm} \\ \displaystyle J_f(x, y) = \begin{bmatrix} 2x & 1 \\ \cos(x) & 2y \end{bmatrix} \\ \vspace{5mm} \end{array} } \hspace{5mm} \]](https://www.reinforcementlearningpath.com/wp-content/ql-cache/quicklatex.com-ffa32d48ecbc577ad17ee8e7a2423281_l3.png)
This matrix tells us how each component of the function f(x, y) changes when we change x and y.
In RL applications, the Jacobian matrix is applied in cases like:
- compute how policy parameters affect the action distribution.
- tells us how small changes in policy parameters affect the selected action.
- propagate derivatives backward over multiple timesteps to update policy parameters efficiently.
- prevents the agent from overfitting to noisy state transitions.
3. Hessian Matrix
The Hessian Matrix tells us how fast the slope changes in different directions, helping refine optimization techniques.
The Hessian Matrix is a square matrix of second-order partial derivatives of a scalar function.
Given a function f(x) with multiple variables, the Hessian Matrix H is defined as:
![Rendered by QuickLaTeX.com \[ \hspace{5mm} \fbox{ \begin{array}{c} \vspace{5mm} \\ \displaystyle H(f) = \begin{bmatrix} \frac{\partial^2 f}{\partial x_1^2} & \frac{\partial^2 f}{\partial x_1 \partial x_2} & \cdots & \frac{\partial^2 f}{\partial x_1 \partial x_n} \\ \frac{\partial^2 f}{\partial x_2 \partial x_1} & \frac{\partial^2 f}{\partial x_2^2} & \cdots & \frac{\partial^2 f}{\partial x_2 \partial x_n} \\ \vdots & \vdots & \ddots & \vdots \\ \frac{\partial^2 f}{\partial x_n \partial x_1} & \frac{\partial^2 f}{\partial x_n \partial x_2} & \cdots & \frac{\partial^2 f}{\partial x_n^2} \end{bmatrix} \\ \vspace{5mm} \end{array} } \hspace{5mm} \]](https://www.reinforcementlearningpath.com/wp-content/ql-cache/quicklatex.com-fe7d43c017ca4bc2dc6cd330c4a685b2_l3.png)
ANALOGY
Imagine you’re skiing on a mountain. Hessian tells you how the steepness changes.
- If you’re in a valley, the steepness decreases as you move, and the Hessian is positive (convex).
- If you’re on a ridge, the steepness increases in one direction but decreases in another (saddle point).
- If you’re at the peak of a hill, the Hessian is negative, indicating a local maximum.
EXAMPLE 3: Simple example of Hessian Matrix
Consider the function:
![Rendered by QuickLaTeX.com \[ \hspace{5mm} \fbox{ \begin{array}{c} \vspace{5mm} \\ \displaystyle f(x, y) = x^2 + 2xy + y^2 \\ \vspace{5mm} \end{array} } \hspace{5mm} \]](https://www.reinforcementlearningpath.com/wp-content/ql-cache/quicklatex.com-f57a1a2ca9091ab9088c34d160b1c4fa_l3.png)
First, compute the first-order derivatives:
![Rendered by QuickLaTeX.com \[ \hspace{5mm} \fbox{ \begin{array}{c} \vspace{5mm} \\ \displaystyle \frac{\partial f}{\partial x} = 2x + 2y, \quad \frac{\partial f}{\partial y} = 2x + 2y \\ \vspace{5mm} \end{array} } \hspace{5mm} \]](https://www.reinforcementlearningpath.com/wp-content/ql-cache/quicklatex.com-b8208bf242ea89e5e2bd97afac049528_l3.png)
Now, compute the second-order derivatives:
![Rendered by QuickLaTeX.com \[ \hspace{5mm} \fbox{ \begin{array}{c} \vspace{5mm} \\ \displaystyle \frac{\partial^2 f}{\partial x^2} = 2, \quad \frac{\partial^2 f}{\partial y^2} = 2, \quad \frac{\partial^2 f}{\partial x \partial y} = 2 \\ \vspace{5mm} \end{array} } \hspace{5mm} \]](https://www.reinforcementlearningpath.com/wp-content/ql-cache/quicklatex.com-d78c637769e988aef2352faf114563ad_l3.png)
The Hessian matrix is:
![Rendered by QuickLaTeX.com \[ \hspace{5mm} \fbox{ \begin{array}{c} \vspace{5mm} \\ \displaystyle H(f) = \begin{bmatrix} 2 & 2 \\ 2 & 2 \end{bmatrix} \\ \vspace{5mm} \end{array} } \hspace{5mm} \]](https://www.reinforcementlearningpath.com/wp-content/ql-cache/quicklatex.com-8c424dde5cf64c147f665971599ab383_l3.png)
This tells us that the function has uniform curvature in all directions.
In RL applications, the Hessian matrix is applied in cases like:
- update policies more efficiently than standard gradient methods.
- ensure policy updates remain within a safe range, preventing drastic changes in the policy that can lead to instability.
- speed up convergence in policy optimization.
4. Directional Derivative
Many real-world problems involve movement in arbitrary directions, not just along coordinate axes.
The Directional Derivative measures how a function changes as we move in a specific direction. In other words, the Directional Derivative allows us to measure change in any arbitrary direction, not just along the standard axes.
ANALOGY
The Directional Derivative answers to this question: “If I walk in this particular direction, how fast will my altitude change?“
EXAMPLE 4: Simple example of Directional Derivative
We have this function:
![Rendered by QuickLaTeX.com \[ \hspace{5mm} \fbox{ \begin{array}{c} \vspace{5mm} \\ \displaystyle f(x, y) = x^2 + y^2 \\ \vspace{5mm} \end{array} } \hspace{5mm} \]](https://www.reinforcementlearningpath.com/wp-content/ql-cache/quicklatex.com-cd495920c3387c43a96d8c9526602754_l3.png)
Find the directional derivative at (1, 1) in the direction of v=(3, 4).
STEP 1: Compute the Gradient
![Rendered by QuickLaTeX.com \[ \hspace{5mm} \fbox{ \begin{array}{c} \vspace{5mm} \\ \displaystyle \nabla f(x, y) = \left( \frac{\partial f}{\partial x}, \frac{\partial f}{\partial y} \right) = (2x, 2y) \\ \vspace{5mm} \end{array} } \hspace{5mm} \]](https://www.reinforcementlearningpath.com/wp-content/ql-cache/quicklatex.com-2155412622ea693785101848b04d10fe_l3.png)
At (1,1):
![Rendered by QuickLaTeX.com \[ \hspace{5mm} \fbox{ \begin{array}{c} \vspace{5mm} \\ \displaystyle \nabla f(1,1) = (2,2) \\ \vspace{5mm} \end{array} } \hspace{5mm} \]](https://www.reinforcementlearningpath.com/wp-content/ql-cache/quicklatex.com-8525897321f4c60cf8abac494744adbc_l3.png)
STEP 2: Normalize the Direction Vector
The given direction is v=(3,4). The unit vector is:
![Rendered by QuickLaTeX.com \[ \hspace{5mm} \fbox{ \begin{array}{c} \vspace{5mm} \\ \displaystyle \mathbf{v}_{\text{unit}} = \frac{(3,4)}{\sqrt{3^2 + 4^2}} = \left( \frac{3}{5}, \frac{4}{5} \right) \\ \vspace{5mm} \end{array} } \hspace{5mm} \]](https://www.reinforcementlearningpath.com/wp-content/ql-cache/quicklatex.com-875a67fd6c692aa7773f9b122d2a25af_l3.png)
STEP 3: Compute the Directional Derivative
![Rendered by QuickLaTeX.com \[ \hspace{5mm} \fbox{ \begin{array}{c} \vspace{5mm} \\ \displaystyle D_{\mathbf{v}} f(1,1) = \nabla f(1,1) \cdot \mathbf{v}_{\text{unit}} \\ \vspace{3mm} \\ \displaystyle = (2,2) \cdot \left( \frac{3}{5}, \frac{4}{5} \right) \\ \vspace{3mm} \\ \displaystyle = 2 \times \frac{3}{5} + 2 \times \frac{4}{5} = \frac{6}{5} + \frac{8}{5} = \frac{14}{5} = 2.8 \\ \vspace{5mm} \end{array} } \hspace{5mm} \]](https://www.reinforcementlearningpath.com/wp-content/ql-cache/quicklatex.com-8672f1426b18cb727dc45860d04caf20_l3.png)
Interpretation: Moving in the direction of (3, 4), the function f(x, y) increases at a rate of 2.8 per unit of movement.
In RL applications, the Directional Derivative is applied in cases like:
- exploring specific directions of improvement, which is where directional derivatives help.
- determine how changing weights in a specific direction affects expected rewards.
- instead of following a strict gradient update, we explore different directions based on information gain.
5. Total Derivative
A Total Derivative measures how the function changes as all related variables change.
In dynamical systems, control theory, and RL, states often evolve over time due to multiple factors, so a Total Derivative is needed to capture the full effect of changes.
ANALOGY
Imagine you’re climbing a mountain, but instead of just moving upwards, you are also moving sideways due to wind blowing you in a certain direction. The Total Derivative gives the rate of change in the direction you’re actually moving.
EXAMPLE 5: Simple example of Total Derivative
We have this function:
![Rendered by QuickLaTeX.com \[ \hspace{5mm} \fbox{ \begin{array}{c} \vspace{5mm} \\ \displaystyle f(x, y) = x^2 + y^2 \\ \vspace{5mm} \end{array} } \hspace{5mm} \]](https://www.reinforcementlearningpath.com/wp-content/ql-cache/quicklatex.com-cd495920c3387c43a96d8c9526602754_l3.png)
where both x and y depend on time t, such that:
![Rendered by QuickLaTeX.com \[ \hspace{5mm} \fbox{ \begin{array}{c} \vspace{5mm} \\ \displaystyle x = t^2, \quad y = \sin(t) \\ \vspace{5mm} \end{array} } \hspace{5mm} \]](https://www.reinforcementlearningpath.com/wp-content/ql-cache/quicklatex.com-38dcc018710f6fa588cd3b3c9dc22ca9_l3.png)
The Total Derivative is:
![Rendered by QuickLaTeX.com \[ \hspace{5mm} \fbox{ \begin{array}{c} \vspace{5mm} \\ \displaystyle \frac{d f}{d t} = \frac{\partial f}{\partial x} \frac{d x}{d t} + \frac{\partial f}{\partial y} \frac{d y}{d t} \\ \vspace{5mm} \end{array} } \hspace{5mm} \]](https://www.reinforcementlearningpath.com/wp-content/ql-cache/quicklatex.com-b91a90328f11a0542bd242428e2e3124_l3.png)
Computing each term:
![Rendered by QuickLaTeX.com \[ \hspace{5mm} \fbox{ \begin{array}{c} \vspace{5mm} \\ \displaystyle \frac{\partial f}{\partial x} = 2x, \quad \frac{\partial f}{\partial y} = 2y \\ \vspace{3mm} \\ \displaystyle \frac{d x}{d t} = 2t, \quad \frac{d y}{d t} = \cos(t) \\ \vspace{5mm} \end{array} } \hspace{5mm} \]](https://www.reinforcementlearningpath.com/wp-content/ql-cache/quicklatex.com-b11b805e8ff6eab725995a9b26c4adb6_l3.png)
Thus that:
![Rendered by QuickLaTeX.com \[ \hspace{5mm} \fbox{ \begin{array}{c} \vspace{5mm} \\ \displaystyle \frac{d f}{d t} = 2x (2t) + 2y (\cos t) \\ \vspace{5mm} \end{array} } \hspace{5mm} \]](https://www.reinforcementlearningpath.com/wp-content/ql-cache/quicklatex.com-bcf0f551ab421e6abd29ace11c310d07_l3.png)
Substituting x=t2 and y=sin t:
![Rendered by QuickLaTeX.com \[ \hspace{5mm} \fbox{ \begin{array}{c} \vspace{5mm} \\ \displaystyle \frac{d f}{d t} = 2 (t^2)(2t) + 2 (\sin t)(\cos t) \\ \vspace{5mm} \\ \displaystyle \frac{d f}{d t} = 4t^3 + 2 \sin t \cos t \\ \vspace{5mm} \end{array} } \hspace{5mm} \]](https://www.reinforcementlearningpath.com/wp-content/ql-cache/quicklatex.com-e0c23d1e7d5fdda8ff4375cea176a1b6_l3.png)
This Total Derivative tells us how f(x, y) changes over time, considering both direct changes in x and y as well as their dependence on t.
In RL applications, the Total Derivative is applied in cases like:
- the gradient of the expected return involves a total derivative because rewards depend indirectly on actions and states, which themselves depend on previous decisions.
- trajectory optimization (e.g., DDPG, PPO) to compute how small changes in policy parameters influence long-term outcomes.
6. Partial Derivative
A Partial Derivative measures how a function changes as only one of its inputs changes.
In Reinforcement Learning, this is similar to adjusting one parameter at a time to see how it affects the overall performance, keeping other parameters fixed.
ANALOGY
Imagine you are baking a cake, and the taste depends on sugar, flour, and butter. If you want to know how changing only the sugar affects the taste while keeping flour and butter the same, you are computing a partial derivative.
EXAMPLE 5: Simple example of Partial Derivative
The function is:
![Rendered by QuickLaTeX.com \[ \hspace{5mm} \fbox{ \begin{array}{c} \vspace{5mm} \\ \displaystyle f(x, y) = x^2 + 3xy + y^2 \\ \vspace{5mm} \end{array} } \hspace{5mm} \]](https://www.reinforcementlearningpath.com/wp-content/ql-cache/quicklatex.com-a8b8fb3d14244640d8f2af2737757dbc_l3.png)
Partial Derivative with respect to x is:
![Rendered by QuickLaTeX.com \[ \hspace{5mm} \fbox{ \begin{array}{c} \vspace{5mm} \\ \displaystyle \frac{\partial f}{\partial x} = 2x + 3y \\ \vspace{5mm} \end{array} } \hspace{5mm} \]](https://www.reinforcementlearningpath.com/wp-content/ql-cache/quicklatex.com-fde152ec48b1b23eb1d2a6404e65dc85_l3.png)
Partial Derivative with respect to y is:
![Rendered by QuickLaTeX.com \[ \hspace{5mm} \fbox{ \begin{array}{c} \vspace{5mm} \\ \displaystyle \frac{\partial f}{\partial y} = 3x + 2y \\ \vspace{5mm} \end{array} } \hspace{5mm} \]](https://www.reinforcementlearningpath.com/wp-content/ql-cache/quicklatex.com-69effe397d10e2c2c692f77c29a28da5_l3.png)
These derivatives tell us:
- How the function changes when x increases, while the y is fixed.
- How the function changes when y increases, while the x is fixed.
In RL applications, the Partial Derivative is applied in cases like:
- Advantage Actor-Critic (A2C) and Proximal Policy Optimization (PPO) methods use Partial Derivatives to compute the advantage function and adjust the policy accordingly.
- using a function approximator for estimating a value function, we compute Partial Derivatives of the loss function to update the network weights.
References:
- Sutton, R. S., & Barto, A. G. (2018). Reinforcement Learning: An Introduction. MIT Press.
- James Stewart (2015). Calculus. Cengage Learning.
- Ian Goodfellow & Yoshua Bengio & Aaron Courville (2016). Deep Learning (Adaptive Computation and Machine Learning series). MIT Press.
- K. F. Riley & M. P. Hobson & S. J. Bence (2006). Mathematical Methods for Physics and Engineering: A Comprehensive Guide. Cambridge University Press
- Dr. Trefor Bazett (2020). Directional Derivatives | What’s the slope in any direction? YouTube: https://www.youtube.com/watch?v=GJODOGq7cAY&ab_channel=Dr.TreforBazett
- Dr. Bevin Maultsby (2022). The Hessian Matrix: Derivation, Interpretation, and Example, Real Analysis II. YouTube: https://www.youtube.com/watch?v=uWOHsCL7Hik&ab_channel=Dr.BevinMaultsby
- Wikipedia contributors (2024). Hessian matrix. Retrieved from https://en.wikipedia.org/wiki/Hessian_matrix