Derivatives: Gradient, Jacobian Matrix, Hessian Matrix, Directional Derivative, Total and Partial Derivative

This page was last edited on 05 November 2025

In the previous page of the tutorial we covered how vectors are used to represent the agent’s states and actions. In this section we will explore derivatives.

Derivatives are important in Reinforcement Learning because they help to accelerate the learning process, adjust the policy parameters so that the agent takes better decisions, and ensure consistent learning updates.

The role of derivatives in RL is:

  • Policy Optimization: Derivatives adjust the agent’s actions to maximize rewards.
  • Value Function Approximation: Derivatives refine the predictions about future rewards.
  • Backpropagation: In the backpropagation process, the derivatives are used to update neural network weights during training.

Derivatives solve some fundamental problems in RL. This includes:

  • Optimization Challenges: Derivatives tell us how to change the actions to get better outcomes.
  • Non-Linear Dynamics: Derivatives help to navigate complex environments where outcomes are not straightforward.
  • Function Approximation: Derivatives help us estimate the value functions accurately.

To explain this concept in a simple way, we can imagine that a derivative measures the rate of change of a function’s output(s) with respect to the change in its input(s). If you have a function f(x), the derivative tells you how fast that function increases or decreases depending on the value of x.

To illustrate this concept further, let’s explore a specific example of how a derivative is used by applying the equation of position to an object in accelerated motion.

STEP 1: The definition of the derivative

The derivative of a function x(t) is defined as follows:

     \[ \hspace{5mm} \fbox{     \begin{array}{c}         \vspace{5mm} \\          \displaystyle \frac{dx}{dt} = \lim_{\Delta t \to 0} \frac{x(t + \Delta t) - x(t)}{\Delta t} \\           \vspace{5mm}      \end{array} }  \hspace{5mm} \]

Where:

  • x(t) – A function that depends on time t (e.g., position of an object at time t)
  • dx – An infinitesimally small change in x
  • dt – An infinitesimally small change in time t
  • Δt – A finite small change in time, used in the difference quotient before taking the limit
  • lim Δt→0 – The limit operation that makes the difference quotient approach an exact derivative by reducing Δt to an infinitesimally small value

This is the rate of change of position over time. It tells us how fast x(t) is changing as time t progresses.

STEP 2: The equation of position of an object in accelerated motion

It’s one of the equations of motion in physics, and it describes the position of an object as a function of time, assuming that the object is moving with constant acceleration.

     \[ \hspace{5mm} \fbox{     \begin{array}{c}         \vspace{5mm} \\          \displaystyle x(t) = \frac{1}{2} at^2 + v_0 t + x_0 \\           \vspace{5mm}      \end{array} }  \hspace{5mm} \]

Where:

  • x(t) – Position of an object as a function of time t
  • t – Time variable
  • a – Constant acceleration of the object
  • v0 – Initial velocity of the object (velocity at t=0)
  • x0 – Initial position of the object (position at t=0)

We use specific values ​​to make the calculations clear:

  • acceleration: a = 6m/s2
  • velocity at time t=0: v0 = 2m/s
  • initial position: x0 = 0m

The result is the function:

     \[ \hspace{5mm} \fbox{     \begin{array}{c}         \vspace{5mm} \\          \displaystyle x(t) = 3t^2 + 2t \\           \vspace{5mm}      \end{array} }  \hspace{5mm} \]

STEP 3: We apply the definition of the derivative

We calculate the derivative using the basic formula:

     \[ \hspace{5mm} \fbox{     \begin{array}{c}         \vspace{5mm} \\          \displaystyle \frac{dx}{dt} = \lim_{\Delta t \to 0} \frac{x(t + \Delta t) - x(t)}{\Delta t} \\           \vspace{5mm}      \end{array} }  \hspace{5mm} \]

We substitute the function: x(t) = 3t2 + 2t

     \[ \hspace{5mm} \fbox{     \begin{array}{c}         \vspace{5mm} \\          \displaystyle \frac{dx}{dt} = \lim_{\Delta t \to 0} \frac{3(t + \Delta t)^2 + 2(t + \Delta t) - (3t^2 + 2t)}{\Delta t} \\           \vspace{5mm}      \end{array} }  \hspace{5mm} \]

STEP 4: Expand the terms in the numerator

Calculate (t + Δt)2

     \[ \hspace{5mm} \fbox{     \begin{array}{c}         \vspace{5mm} \\          \displaystyle (t + \Delta t)^2 = t^2 + 2t\Delta t + (\Delta t)^2 \\           \vspace{5mm}      \end{array} }  \hspace{5mm} \]

We substitute into the equation:

     \[ \hspace{5mm} \fbox{     \begin{array}{c}         \vspace{5mm} \\          \displaystyle \frac{dx}{dt} = \lim_{\Delta t \to 0} \frac{3(t^2 + 2t\Delta t + (\Delta t)^2) + 2t + 2\Delta t - (3t^2 + 2t)}{\Delta t} \\           \vspace{5mm}      \end{array} }  \hspace{5mm} \]

We distribute the factor 3:

     \[ \hspace{5mm} \fbox{     \begin{array}{c}         \vspace{5mm} \\          \displaystyle \lim_{\Delta t \to 0} \frac{(3t^2 + 6t\Delta t + 3(\Delta t)^2) + 2t + 2\Delta t - 3t^2 - 2t}{\Delta t} \\           \vspace{5mm}      \end{array} }  \hspace{5mm} \]

We observe that 3t2 and -3t2, as well as 2t and -2t is eliminated from the equation. The result is:

     \[ \hspace{5mm} \fbox{     \begin{array}{c}         \vspace{5mm} \\          \displaystyle \lim_{\Delta t \to 0} \frac{6t\Delta t + 3(\Delta t)^2 + 2\Delta t}{\Delta t} \\           \vspace{5mm}      \end{array} }  \hspace{5mm} \]

We can factor Δt out of all terms:

     \[ \hspace{5mm} \fbox{     \begin{array}{c}         \vspace{5mm} \\          \displaystyle \lim_{\Delta t \to 0} \frac{\Delta t(6t + 3\Delta t + 2)}{\Delta t} \\           \vspace{5mm}      \end{array} }  \hspace{5mm} \]

Now we simplify Δt:

     \[ \hspace{5mm} \fbox{     \begin{array}{c}         \vspace{5mm} \\          \displaystyle \lim_{\Delta t \to 0} (6t + 3\Delta t + 2) \\           \vspace{5mm}      \end{array} }  \hspace{5mm} \]

We apply the limit Δt→0. When Δt approaches zero, the 3Δt term disappears:

     \[ \hspace{5mm} \fbox{     \begin{array}{c}         \vspace{5mm} \\          \displaystyle v(t) = 6t + 2 \\           \vspace{5mm}      \end{array} }  \hspace{5mm} \]

STEP 5: Demonstration

To demonstrate the role of the derivative and how to interpret the results, we will do 5 iterations of manual calculation.

We will use the equations of motion to calculate position, velocity, and acceleration at 5 different times and interpret what these results mean.

Position equation: x(t) = 3t2 + 2t

Velocity equation (derivative of position): v(t) = 6t + 2

If we derive once more, we get the derivate of the acceleration, which is:

     \[ \hspace{5mm} \fbox{     \begin{array}{c}         \vspace{5mm} \\          \displaystyle a(t) = \frac{d}{dt} v(t) = \frac{d}{dt} (6t + 2) = 6 \\           \vspace{5mm}      \end{array} }  \hspace{5mm} \]

We will calculate the position, velocity and acceleration at t= 0, 1, 2, 3, 4 seconds.

ITERATION 1

t(s)=0

x(t)=3t2+2t=3(0)2+2(0)=0

v(t)=6t+2=6(0)+2=2

a(t)=6


ITERATION 2

t(s)=1

x(t)=3t2+2t=3(1)2+2(1)=3+2=5

v(t)=6t+2=6(1)+2=8

a(t)=6


ITERATION 3

t(s)=2

x(t)=3t2+2t=3(2)2+2(2)=12+4=16

v(t)=6t+2=6(2)+2=14

a(t)=6


ITERATION 4

t(s)=3

x(t)=3t2+2t=3(3)2+2(3)=27+6=33

v(t)=6t+2=6(3)+2=20

a(t)=6


ITERATION 5

t(s)=4

x(t)=3t2+2t=3(4)2+2(4)=48+8=56

v(t)=6t+2=6(4)+2=26

a(t)=6


In the table below, you can find an overview of the manual calculations for all five iterations.

t(s)x(t)v(t)a(t)
0026
1586
216146
333206
456266

We can observe the following aspects:

  • The position x(t) increases at an accelerated rate (since we have t2 in the equation). Position x(t) tells us where the object is at a certain time.
  • The velocity v(t) increases linearly (since the derivative of position is a function of degree 1. Velocity v(t) (the derivative of position) tells us how fast the object is moving.
  • The acceleration is constant at 6 m/s², which shows that the motion is uniformly accelerated. Acceleration a(t) (the derivative of velocity)tells us how fast the velocity is changing.

In the bellow graph, the tangent shows us how quickly the position is changing at that moment. In this case, at moment t=2.

This illustrates how the tangent (velocity) changes over time, helping to better understand the derivative.
This illustrates how the tangent (velocity) changes over time, helping to better understand the derivative.

The tangent shows the object’s speed, direction at that moment, and what would happen if the object continued moving at the same speed.

The tangent (red line) changes because the speed changes. If the object in motion were moving at a constant speed, the tangent would always remain at the same slope. But since the object accelerates (6 m/s²), the tangent becomes steeper—indicating that its speed is increasing.

How will derivatives help us in RL?

For example, in Reinforcement Learning (RL), a robot needs to learn how to move on its own. The tangent helps it understand how fast its position is changing so that it knows how to stop, accelerate, or avoid obstacles.

If the robot did not understand the tangent (speed), it would not be able to make correct decisions.

For instance, if it needs to stop before hitting a wall, it must know how fast it is moving and how to slow down.

In Reinforcement Learning there are several types of derivatives, because each has a specific role in optimizing and learning the agent. Each method has trade-offs between accuracy, efficiency and stability.


Types of Derivatives

In Reinforcement Learning (RL) there are several types of derivatives, as each has a specific role in optimizing and learning the agent.

1. Gradient

The gradient measures how a function of several variables changes in all possible directions.

The gradient is used to indicate the direction of the fastest growth of the function and to optimize models in machine learning and neural networks.

Mathematically, the gradient of a function f(x, y) is:

     \[ \hspace{5mm} \fbox{     \begin{array}{c}         \vspace{5mm} \\          \displaystyle \nabla f = \left( \frac{\partial f}{\partial x}, \frac{\partial f}{\partial y} \right) \\           \vspace{5mm}      \end{array} }  \hspace{5mm} \]

Where:

  • ∇f – The gradient of the function f. It is a vector indicating the direction and maximum rate of increase of the function.
  • ∂f – Represents a small change in the function f, but only with respect to a specific variable.
  • ∂x – It represents an infinitesimal change in x.
  • ∂y – Similar to ∂x, but applied to the variable y.

ANALOGY

Imagine climbing a hill blindfolded. If you want to get to the top as quickly as possible, you have to choose the direction in which the slope is steepest. The gradient tells you exactly that direction!

If you were on a flat surface, the gradient would be zero, indicating that there is no slope to climb.

We have the simple function:

     \[ \hspace{5mm} \fbox{     \begin{array}{c}         \vspace{5mm} \\          \displaystyle f(x, y) = x^2 + y^2 \\           \vspace{5mm}      \end{array} }  \hspace{5mm} \]

The gradient is:

     \[ \hspace{5mm} \fbox{     \begin{array}{c}         \vspace{5mm} \\          \displaystyle \nabla f = \left( \frac{\partial f}{\partial x}, \frac{\partial f}{\partial y} \right) = (2x, 2y) \\           \vspace{5mm}      \end{array} }  \hspace{5mm} \]

If we choose a point, for example (1, 2), the gradient becomes:

     \[ \hspace{5mm} \fbox{     \begin{array}{c}         \vspace{5mm} \\          \displaystyle \nabla f(1,2) = (2 \cdot 1, 2 \cdot 2) = (2,4) \\           \vspace{5mm}      \end{array} }  \hspace{5mm} \]

The vector (2, 4) shows us the direction of fastest growth of the function.

In RL applications, the gradient is applied in cases such as:

  • REINFORCE Algorithm to improve action selection.
  • Deep Q-Networks (DQN) to minimize the Mean Squared Error (MSE) between target Q-values and predicted Q-values.
  • Noisy Networks for Exploration, where gradients update the noise parameters to learn better exploration strategies.

2. Jacobian Matrix

The Jacobian Matrix works with multiple inputs and outputs simultaneously.

If we have a function with multiple variables, the Jacobian Matrix tells us how each input variable affects each output variable.

If we have a function f:Rn -> Rm, that is, a function that takes a vector of dimension n and returns a vector of dimension m, the Jacobian Matrix is ​​defined as:

     \[ \hspace{5mm} \fbox{     \begin{array}{c}         \vspace{5mm} \\          \displaystyle J_f(x) =         \begin{bmatrix}             \frac{\partial f_1}{\partial x_1} & \frac{\partial f_1}{\partial x_2} & \cdots & \frac{\partial f_1}{\partial x_n} \\             \frac{\partial f_2}{\partial x_1} & \frac{\partial f_2}{\partial x_2} & \cdots & \frac{\partial f_2}{\partial x_n} \\             \vdots & \vdots & \ddots & \vdots \\             \frac{\partial f_m}{\partial x_1} & \frac{\partial f_m}{\partial x_2} & \cdots & \frac{\partial f_m}{\partial x_n}         \end{bmatrix} \\           \vspace{5mm}      \end{array} }  \hspace{5mm} \]

ANALOGY

An analogy for the Jacobian Matrix is ​​how the movement of the steering wheel influences the trajectory of the car.

  • Input variables: Steering wheel rotation and speed.
  • Output variables: Direction and position of the car.

If the steering wheel is turned more (higher derivative), the change in direction will be more pronounced. The Jacobian Matrix expresses how each small change in input affects the output.

We have the following simple vector function:

     \[ \hspace{5mm} \fbox{     \begin{array}{c}         \vspace{5mm} \\          \displaystyle f(x, y) =         \begin{bmatrix}             x^2 + y \\             \sin(x) + y^2         \end{bmatrix} \\           \vspace{5mm}      \end{array} }  \hspace{5mm} \]

The Jacobian Matrix is ​​calculated by taking the partial derivatives:

     \[ \hspace{5mm} \fbox{     \begin{array}{c}         \vspace{5mm} \\          \displaystyle J_f(x, y) =          \begin{bmatrix}             \frac{\partial}{\partial x} (x^2 + y) & \frac{\partial}{\partial y} (x^2 + y) \\             \frac{\partial}{\partial x} (\sin(x) + y^2) & \frac{\partial}{\partial y} (\sin(x) + y^2)         \end{bmatrix} \\           \vspace{5mm}      \end{array} }  \hspace{5mm} \]

Calculating each derivative:

     \[ \hspace{5mm} \fbox{     \begin{array}{c}         \vspace{5mm} \\          \displaystyle J_f(x, y) =          \begin{bmatrix}             2x & 1 \\             \cos(x) & 2y         \end{bmatrix} \\           \vspace{5mm}      \end{array} }  \hspace{5mm} \]

This matrix tells us how each component of the function f(x, y) changes when we change x and y.

In RL applications, the Jacobian matrix is applied in cases like:

  • compute how policy parameters affect the action distribution.
  • tells us how small changes in policy parameters affect the selected action.
  • propagate derivatives backward over multiple timesteps to update policy parameters efficiently.
  • prevents the agent from overfitting to noisy state transitions.

3. Hessian Matrix

The Hessian Matrix tells us how fast the slope changes in different directions, helping refine optimization techniques.

The Hessian Matrix is a square matrix of second-order partial derivatives of a scalar function.

Given a function f(x) with multiple variables, the Hessian Matrix H is defined as:

     \[ \hspace{5mm} \fbox{     \begin{array}{c}         \vspace{5mm} \\          \displaystyle H(f) =          \begin{bmatrix}             \frac{\partial^2 f}{\partial x_1^2} & \frac{\partial^2 f}{\partial x_1 \partial x_2} & \cdots & \frac{\partial^2 f}{\partial x_1 \partial x_n} \\             \frac{\partial^2 f}{\partial x_2 \partial x_1} & \frac{\partial^2 f}{\partial x_2^2} & \cdots & \frac{\partial^2 f}{\partial x_2 \partial x_n} \\             \vdots & \vdots & \ddots & \vdots \\             \frac{\partial^2 f}{\partial x_n \partial x_1} & \frac{\partial^2 f}{\partial x_n \partial x_2} & \cdots & \frac{\partial^2 f}{\partial x_n^2}         \end{bmatrix} \\           \vspace{5mm}      \end{array} }  \hspace{5mm} \]

ANALOGY

Imagine you’re skiing on a mountain. Hessian tells you how the steepness changes.

  • If you’re in a valley, the steepness decreases as you move, and the Hessian is positive (convex).
  • If you’re on a ridge, the steepness increases in one direction but decreases in another (saddle point).
  • If you’re at the peak of a hill, the Hessian is negative, indicating a local maximum.

Consider the function:

     \[ \hspace{5mm} \fbox{     \begin{array}{c}         \vspace{5mm} \\          \displaystyle f(x, y) = x^2 + 2xy + y^2 \\           \vspace{5mm}      \end{array} }  \hspace{5mm} \]

First, compute the first-order derivatives:

     \[ \hspace{5mm} \fbox{     \begin{array}{c}         \vspace{5mm} \\          \displaystyle \frac{\partial f}{\partial x} = 2x + 2y, \quad         \frac{\partial f}{\partial y} = 2x + 2y \\           \vspace{5mm}      \end{array} }  \hspace{5mm} \]

Now, compute the second-order derivatives:

     \[ \hspace{5mm} \fbox{     \begin{array}{c}         \vspace{5mm} \\          \displaystyle \frac{\partial^2 f}{\partial x^2} = 2, \quad         \frac{\partial^2 f}{\partial y^2} = 2, \quad         \frac{\partial^2 f}{\partial x \partial y} = 2 \\           \vspace{5mm}      \end{array} }  \hspace{5mm} \]

The Hessian matrix is:

     \[ \hspace{5mm} \fbox{     \begin{array}{c}         \vspace{5mm} \\          \displaystyle H(f) =          \begin{bmatrix}             2 & 2 \\             2 & 2         \end{bmatrix} \\           \vspace{5mm}      \end{array} }  \hspace{5mm} \]

This tells us that the function has uniform curvature in all directions.

In RL applications, the Hessian matrix is applied in cases like:

  • update policies more efficiently than standard gradient methods.
  • ensure policy updates remain within a safe range, preventing drastic changes in the policy that can lead to instability.
  • speed up convergence in policy optimization.

4. Directional Derivative

Many real-world problems involve movement in arbitrary directions, not just along coordinate axes.

The Directional Derivative measures how a function changes as we move in a specific direction. In other words, the Directional Derivative allows us to measure change in any arbitrary direction, not just along the standard axes.

ANALOGY

The Directional Derivative answers to this question: “If I walk in this particular direction, how fast will my altitude change?

We have this function:

     \[ \hspace{5mm} \fbox{     \begin{array}{c}         \vspace{5mm} \\          \displaystyle f(x, y) = x^2 + y^2 \\           \vspace{5mm}      \end{array} }  \hspace{5mm} \]

Find the directional derivative at (1, 1) in the direction of v=(3, 4).

STEP 1: Compute the Gradient

     \[ \hspace{5mm} \fbox{     \begin{array}{c}         \vspace{5mm} \\          \displaystyle \nabla f(x, y) =          \left( \frac{\partial f}{\partial x}, \frac{\partial f}{\partial y} \right) =         (2x, 2y) \\           \vspace{5mm}      \end{array} }  \hspace{5mm} \]

At (1,1):

     \[ \hspace{5mm} \fbox{     \begin{array}{c}         \vspace{5mm} \\          \displaystyle \nabla f(1,1) = (2,2) \\           \vspace{5mm}      \end{array} }  \hspace{5mm} \]

STEP 2: Normalize the Direction Vector

The given direction is v=(3,4). The unit vector is:

     \[ \hspace{5mm} \fbox{     \begin{array}{c}         \vspace{5mm} \\          \displaystyle \mathbf{v}_{\text{unit}} =          \frac{(3,4)}{\sqrt{3^2 + 4^2}} =          \left( \frac{3}{5}, \frac{4}{5} \right) \\           \vspace{5mm}      \end{array} }  \hspace{5mm} \]

STEP 3: Compute the Directional Derivative

     \[ \hspace{5mm} \fbox{     \begin{array}{c}         \vspace{5mm} \\          \displaystyle D_{\mathbf{v}} f(1,1) = \nabla f(1,1) \cdot \mathbf{v}_{\text{unit}} \\           \vspace{3mm} \\         \displaystyle = (2,2) \cdot \left( \frac{3}{5}, \frac{4}{5} \right) \\           \vspace{3mm} \\         \displaystyle = 2 \times \frac{3}{5} + 2 \times \frac{4}{5} = \frac{6}{5} + \frac{8}{5} = \frac{14}{5} = 2.8 \\           \vspace{5mm}      \end{array} }  \hspace{5mm} \]

Interpretation: Moving in the direction of (3, 4), the function f(x, y) increases at a rate of 2.8 per unit of movement.

In RL applications, the Directional Derivative is applied in cases like:

  • exploring specific directions of improvement, which is where directional derivatives help.
  • determine how changing weights in a specific direction affects expected rewards.
  • instead of following a strict gradient update, we explore different directions based on information gain.

5. Total Derivative

A Total Derivative measures how the function changes as all related variables change.

In dynamical systems, control theory, and RL, states often evolve over time due to multiple factors, so a Total Derivative is needed to capture the full effect of changes.

ANALOGY

Imagine you’re climbing a mountain, but instead of just moving upwards, you are also moving sideways due to wind blowing you in a certain direction. The Total Derivative gives the rate of change in the direction you’re actually moving.

We have this function:

     \[ \hspace{5mm} \fbox{     \begin{array}{c}         \vspace{5mm} \\          \displaystyle f(x, y) = x^2 + y^2 \\           \vspace{5mm}      \end{array} }  \hspace{5mm} \]

where both x and y depend on time t, such that:

     \[ \hspace{5mm} \fbox{     \begin{array}{c}         \vspace{5mm} \\          \displaystyle x = t^2, \quad y = \sin(t) \\           \vspace{5mm}      \end{array} }  \hspace{5mm} \]

The Total Derivative is:

     \[ \hspace{5mm} \fbox{     \begin{array}{c}         \vspace{5mm} \\          \displaystyle \frac{d f}{d t} = \frac{\partial f}{\partial x} \frac{d x}{d t} + \frac{\partial f}{\partial y} \frac{d y}{d t} \\           \vspace{5mm}      \end{array} }  \hspace{5mm} \]

Computing each term:

     \[ \hspace{5mm} \fbox{     \begin{array}{c}         \vspace{5mm} \\          \displaystyle \frac{\partial f}{\partial x} = 2x, \quad \frac{\partial f}{\partial y} = 2y \\           \vspace{3mm} \\         \displaystyle \frac{d x}{d t} = 2t, \quad \frac{d y}{d t} = \cos(t) \\           \vspace{5mm}      \end{array} }  \hspace{5mm} \]

Thus that:

     \[ \hspace{5mm} \fbox{     \begin{array}{c}         \vspace{5mm} \\          \displaystyle \frac{d f}{d t} = 2x (2t) + 2y (\cos t) \\           \vspace{5mm}      \end{array} }  \hspace{5mm} \]

Substituting x=t2 and y=sin t:

     \[ \hspace{5mm} \fbox{     \begin{array}{c}         \vspace{5mm} \\          \displaystyle \frac{d f}{d t} = 2 (t^2)(2t) + 2 (\sin t)(\cos t) \\           \vspace{5mm} \\         \displaystyle \frac{d f}{d t} = 4t^3 + 2 \sin t \cos t \\           \vspace{5mm}      \end{array} }  \hspace{5mm} \]

This Total Derivative tells us how f(x, y) changes over time, considering both direct changes in x and y as well as their dependence on t.

In RL applications, the Total Derivative is applied in cases like:

  • the gradient of the expected return involves a total derivative because rewards depend indirectly on actions and states, which themselves depend on previous decisions.
  • trajectory optimization (e.g., DDPG, PPO) to compute how small changes in policy parameters influence long-term outcomes.

6. Partial Derivative

A Partial Derivative measures how a function changes as only one of its inputs changes.

In Reinforcement Learning, this is similar to adjusting one parameter at a time to see how it affects the overall performance, keeping other parameters fixed.

ANALOGY

Imagine you are baking a cake, and the taste depends on sugar, flour, and butter. If you want to know how changing only the sugar affects the taste while keeping flour and butter the same, you are computing a partial derivative.

The function is:

     \[ \hspace{5mm} \fbox{     \begin{array}{c}         \vspace{5mm} \\          \displaystyle f(x, y) = x^2 + 3xy + y^2 \\           \vspace{5mm}      \end{array} }  \hspace{5mm} \]

Partial Derivative with respect to x is:

     \[ \hspace{5mm} \fbox{     \begin{array}{c}         \vspace{5mm} \\          \displaystyle \frac{\partial f}{\partial x} = 2x + 3y \\           \vspace{5mm}      \end{array} }  \hspace{5mm} \]

Partial Derivative with respect to y is:

     \[ \hspace{5mm} \fbox{     \begin{array}{c}         \vspace{5mm} \\          \displaystyle \frac{\partial f}{\partial y} = 3x + 2y \\           \vspace{5mm}      \end{array} }  \hspace{5mm} \]

These derivatives tell us:

  • How the function changes when x increases, while the y is fixed.
  • How the function changes when y increases, while the x is fixed.

In RL applications, the Partial Derivative is applied in cases like:

  • Advantage Actor-Critic (A2C) and Proximal Policy Optimization (PPO) methods use Partial Derivatives to compute the advantage function and adjust the policy accordingly.
  • using a function approximator for estimating a value function, we compute Partial Derivatives of the loss function to update the network weights.

References:


Vectors << Previous | Next >> Gradients