This page was last edited on 11 November 2025
Function Approximation is the process of estimating a function using a model when the true function is unknown or too complex.
In Reinforcement Learning(RL), it means using a parameterized function to estimate things like value functions or policies.
Instead of storing exact values in a table, we approximate them.
Why do we use Function Approximation in Deep RL?
Simple: the state/action space is too large (or continuous). We can’t store values for each state.
Function approximation helps generalize across similar states. It lets agents scale to complex tasks like robotics or games.
Equation for Function Approximation
The most basic one is:
![Rendered by QuickLaTeX.com \[ \hspace{5mm} \fbox{ \begin{array}{c} \vspace{5mm} \\ \displaystyle \hat{v}(s; \mathbf{w}) \approx v(s) \\ \vspace{5mm} \end{array} } \hspace{5mm} \]](https://www.reinforcementlearningpath.com/wp-content/ql-cache/quicklatex.com-2b6ef8a9d49c2ef55786d78f1e3c2712_l3.png)
Where:
- v^(s;w): estimated value of state s
- w: parameters (e.g., weights of a neural net)
- v(s): true value of state s, which we don’t know
We adjust w using gradient descent:
![Rendered by QuickLaTeX.com \[ \hspace{5mm} \fbox{ \begin{array}{c} \vspace{5mm} \\ \displaystyle \mathbf{w} \leftarrow \mathbf{w} + \alpha \cdot \left( \text{target} - \hat{v}(s; \mathbf{w}) \right) \cdot \nabla_{\mathbf{w}} \hat{v}(s; \mathbf{w}) \\ \vspace{5mm} \end{array} } \hspace{5mm} \]](https://www.reinforcementlearningpath.com/wp-content/ql-cache/quicklatex.com-285a6a1d34d7d8ddc06eda26a8541798_l3.png)
ANALOGY
Think of a painter. We give them a photo and they try to paint it.
They’ll never match it perfectly, but with enough time and tweaks, the result looks very close.
Function Approximation works the same. It “paints” an approximation of a value function.
HISTORY
Function Approximation was used in early control theory. In RL, it gained traction with TD-Gammon (1992) by Gerald Tesauro.
He used a neural network to approximate value functions in backgammon—one of the first successes in RL + neural nets.
Steps to implement a Function Approximation
- Choose the function to approximate (e.g., Q(s, a))
- Pick a model (linear, NN, etc.)
- Define the loss (e.g., MSE)
- Collect data (transitions)
- Train the model to minimize the loss
- Use the model for decision-making
- Repeat as we gather more data
Inputs and Outputs of Function Approximation
Input:
- current state (or state-action pair). Example: [x, y, vx, vy]
Output:
- predicted value (V, Q, or policy output). Example: predicted Q-value for action A
EXAMPLE: Function Approximation for a simple linear model
Let’s say we want to approximate this true function:
![Rendered by QuickLaTeX.com \[ \hspace{5mm} \fbox{ \begin{array}{c} \vspace{5mm} \\ \displaystyle f(x) = 2x + 1 \\ \vspace{5mm} \end{array} } \hspace{5mm} \]](https://www.reinforcementlearningpath.com/wp-content/ql-cache/quicklatex.com-2ec83e55cd969d2f602c1a22fe981361_l3.png)
We’ll use a simple linear model:
![Rendered by QuickLaTeX.com \[ \hspace{5mm} \fbox{ \begin{array}{c} \vspace{5mm} \\ \displaystyle \hat{f}(x; w_0, w_1) = w_0 \cdot x + w_1 \\ \vspace{5mm} \end{array} } \hspace{5mm} \]](https://www.reinforcementlearningpath.com/wp-content/ql-cache/quicklatex.com-8975eb4cdd1ef81c352a22911b78bb6e_l3.png)
Initial weights:
- w0=0.0, w1=0.0
- Learning rate α=0.1
Training data point:
- x=1, true value y=3 (since 2*1 + 1 = 3)
ITERATION 1
Prediction:
y^ = 0.0 * 1 + 0.0= 0.0
Error = 3 − 0 = 3
Update:
- w0= 0 + 0.1 * 3 * 1(this 1 is the value of x)= 0.3
- w1= 0 + 0.1 * 3 * 1 (this 1 is value of bias)= 0.3
ITERATION 2
Prediction:
y^ = 0.3 * 1 + 0.3 = 0.6
Error = 3 − 0.6 = 2.4
Update:
- w0= 0.3 + 0.1 * 2.4 * 1 = 0.54
- w1= 0.3 + 0.1 * 2.4 * 1 = 0.54
ITERATION 3
Prediction:
y^ = 0.54 * 1 + 0.54 = 1.08
Error = 3 − 1.08 = 1.92
Update:
- w0= 0.54 + 0.1 * 1.92 * 1 = 0.732
- w1= 0.54 + 0.1 * 1.92 * 1 = 0.732
ITERATION 4
Prediction:
y^ = 0.732 * 1 + 0.732 = 1.464
Error = 3 − 1.464 = 1.536
Update:
- w0= 0.732 + 0.1 * 1.536 * 1= 0.8856
- w1= 0.732 + 0.1 * 1.536 * 1= 0.8856
ITERATION 5
Prediction:
y^ = 0.8856 * 1 + 0.8856 = 1.7712
Error = 3 − 1.7712 = 1.2288
Update:
- w0= 0.8856 + 0.1 * 1.2288 * 1 = 1.00848
- w1= 0.8856 + 0.1 * 1.2288 * 1 = 1.00848

The above graph shows how the predicted output y^ improves over 5 iterations using function approximation.
With each update, the prediction gets closer to the true value (3).
This illustrates how Function Approximation refines its estimate through gradient updates—step by step, learning from the error.
References:
- Sutton, R. S., & Barto, A. G. (2018). Reinforcement Learning: An Introduction. MIT Press.
- “Welcome to Spinning Up in Deep RL!” – Open AI.
Normalization << Previous | Next >> Problem Classification