Function Approximation: Equation and Example

This page was last edited on 11 November 2025

Function Approximation is the process of estimating a function using a model when the true function is unknown or too complex.

In Reinforcement Learning(RL), it means using a parameterized function to estimate things like value functions or policies.

Instead of storing exact values in a table, we approximate them.

Why do we use Function Approximation in Deep RL?

Simple: the state/action space is too large (or continuous). We can’t store values for each state.

Function approximation helps generalize across similar states. It lets agents scale to complex tasks like robotics or games.

Equation for Function Approximation

The most basic one is:

    \[ \hspace{5mm} \fbox{     \begin{array}{c}         \vspace{5mm} \\         \displaystyle          \hat{v}(s; \mathbf{w}) \approx v(s) \\         \vspace{5mm}     \end{array} } \hspace{5mm} \]

Where:

  • v^(s;w): estimated value of state s
  • w: parameters (e.g., weights of a neural net)
  • v(s): true value of state s, which we don’t know

We adjust w using gradient descent:

     \[ \hspace{5mm} \fbox{     \begin{array}{c}         \vspace{5mm} \\         \displaystyle          \mathbf{w} \leftarrow \mathbf{w} + \alpha \cdot \left( \text{target} - \hat{v}(s; \mathbf{w}) \right) \cdot \nabla_{\mathbf{w}} \hat{v}(s; \mathbf{w}) \\         \vspace{5mm}     \end{array} } \hspace{5mm} \]

ANALOGY

Think of a painter. We give them a photo and they try to paint it.

They’ll never match it perfectly, but with enough time and tweaks, the result looks very close.

Function Approximation works the same. It “paints” an approximation of a value function.

HISTORY

Function Approximation was used in early control theory. In RL, it gained traction with TD-Gammon (1992) by Gerald Tesauro.

He used a neural network to approximate value functions in backgammon—one of the first successes in RL + neural nets.

Steps to implement a Function Approximation

  1. Choose the function to approximate (e.g., Q(s, a))
  2. Pick a model (linear, NN, etc.)
  3. Define the loss (e.g., MSE)
  4. Collect data (transitions)
  5. Train the model to minimize the loss
  6. Use the model for decision-making
  7. Repeat as we gather more data
Inputs and Outputs of Function Approximation

Input:

  • current state (or state-action pair). Example: [x, y, vx, vy]

Output:

  • predicted value (V, Q, or policy output). Example: predicted Q-value for action A

Let’s say we want to approximate this true function:

    \[ \hspace{5mm} \fbox{     \begin{array}{c}         \vspace{5mm} \\         \displaystyle          f(x) = 2x + 1 \\         \vspace{5mm}     \end{array} } \hspace{5mm} \]

We’ll use a simple linear model:

    \[ \hspace{5mm} \fbox{     \begin{array}{c}         \vspace{5mm} \\         \displaystyle          \hat{f}(x; w_0, w_1) = w_0 \cdot x + w_1 \\         \vspace{5mm}     \end{array} } \hspace{5mm} \]

Initial weights:

  • w0​=0.0, w1=0.0
  • Learning rate α=0.1

Training data point:

  • x=1, true value y=3 (since 2*1 + 1 = 3)

ITERATION 1

Prediction:

y^​ = 0.0 * 1 + 0.0= 0.0

Error = 3 − 0 = 3

Update:

  • w0​= 0 + 0.1 * 3 * 1(this 1 is the value of x)= 0.3
  • w1​= 0 + 0.1 * 3 * 1 (this 1 is value of bias)= 0.3

ITERATION 2

Prediction:

y^​ = 0.3 * 1 + 0.3 = 0.6

Error = 3 − 0.6 = 2.4

Update:

  • w0​= 0.3 + 0.1 * 2.4 * 1 = 0.54
  • w1​= 0.3 + 0.1 * 2.4 * 1 = 0.54

ITERATION 3

Prediction:

y^ ​= 0.54 * 1 + 0.54 = 1.08

Error = 3 − 1.08 = 1.92

Update:

  • w0​= 0.54 + 0.1 * 1.92 * 1 = 0.732
  • w1​= 0.54 + 0.1 * 1.92 * 1 = 0.732

ITERATION 4

Prediction:

y^ ​= 0.732 * 1 + 0.732 = 1.464

Error = 3 − 1.464 = 1.536

Update:

  • w0​= 0.732 + 0.1 * 1.536 * 1= 0.8856
  • w1​= 0.732 + 0.1 * 1.536 * 1= 0.8856

ITERATION 5

Prediction:

y^​ = 0.8856 * 1 + 0.8856 = 1.7712

Error = 3 − 1.7712 = 1.2288

Update:

  • w0​= 0.8856 + 0.1 * 1.2288 * 1 = 1.00848
  • w1​= 0.8856 + 0.1 * 1.2288 * 1 = 1.00848

Function Approximation Over 5 Iterations
Function Approximation Over 5 Iterations

The above graph shows how the predicted output y^ improves over 5 iterations using function approximation.

With each update, the prediction gets closer to the true value (3).
This illustrates how Function Approximation refines its estimate through gradient updates—step by step, learning from the error.


References:


Normalization << Previous | Next >> Problem Classification