This page was last edited on 12 November 2025
An artificial neuron is the basic unit in a neural network. It takes multiple inputs, applies a weight to each input, sums them up, adds a bias, and passes the result through an activation function. In Deep Reinforcement Learning (Deep RL), it’s used inside the neural networks that approximate functions like policies, value functions, or Q-values.
Why do we use artificial neurons in Deep RL?
We use them because they allow us to model complex, nonlinear relationships between input states and outputs (like actions or Q-values). Classical RL methods (like tabular Q-learning) don’t scale to large or continuous state spaces. Artificial neurons allow us to generalize and learn from raw input like images, sensor data, or multi-dimensional states.
Equation of the artificial neuron
The basic equation:
![Rendered by QuickLaTeX.com \[ \hspace{5mm} \fbox{ \begin{array}{c} \vspace{5mm} \\ \displaystyle \text{output} = f \left( \sum_{i=1}^{n} w_i \cdot x_i + b \right) \\ \vspace{5mm} \end{array} } \hspace{5mm} \]](https://www.reinforcementlearningpath.com/wp-content/ql-cache/quicklatex.com-c47454b3bf674acb44beb7e7c96f4d20_l3.png)
Where:
- xi: input values (e.g., state features)
- wi: weights applied to each input
- b: bias (a constant that shifts the output)
- f: activation function (e.g., ReLU, sigmoid, tanh)
- output: neuron output, sent to the next layer or interpreted as a prediction
ANALOGY
Think of a chef tasting ingredients. Each ingredient (input) has a certain quantity and taste (weight). The chef combines them, adds salt (bias), and decides if the flavor is good (activation function). If the combined flavor is strong enough, the dish is accepted. If not, it’s rejected.
HISTORY
The idea started with the McCulloch-Pitts neuron in 1943 — a simplified model of how real neurons work. It was used to simulate logical functions. Later, in the 1950s, the perceptron was developed as the first trainable neural model. But due to limitations, the field slowed down until the 1980s when multilayer networks and backpropagation brought new energy. Artificial neurons became essential in modern AI and Deep RL.
EXAMPLE: How the neuron is learning via weight updates
We define:
- Inputs: x = [0.5, 0.3]
- Weights: w = [0.4, 0.7]
- Bias: b = 0.1
- Activation function: ReLU (f(z) = max(0, z))
- Learning rate: 0.1
- Target output: 1.0
We calculate the error:
error = target – output
ITERATION 1
- Weighted sum: z= 0.5 * 0.4 + 0.3 * 0.7 + 0.1= 0.2 + 0.21 + 0.1 = 0.51
- Output ReLU = 0.51
- Error = 1.0 – 0.51 = 0.49
- Update:
- w_1 = 0.4 + 0.1 * 0.49 * 0.5 = 0.4245
- w_2 = 0.7 + 0.1 * 0.49 * 0.3 = 0.7147
- b = 0.1 + 0.1 * 0.49 = 0.149
ITERATION 2
- Weighted sum: z= 0.50.4245 + 0.30.7147 + 0.149 = 0.21225 + 0.21441 + 0.149 = 0.57566
- Output ReLU = 0.57566
- Error = 0.42434
- w1 = 0.4245 + 0.1 * 0.42434 * 0.5 = 0.4457
- w2 = 0.7147 + 0.1 * 0.42434 * 0.3 = 0.7274
- b = 0.149 + 0.1 * 0.42434 = 0.19143
ITERATION 3
- Weighted sum: z = 0.5 * 0.4457 + 0.3 * 0.7274 + 0.19143 = 0.22285 + 0.21822 + 0.19143 = 0.6325
- Output ReLU = 0.6325
- Error = 0.3675
- w1 = 0.4457 + 0.1 * 0.3675 * 0.5 = 0.46407
- w2 = 0.7274 + 0.1 * 0.3675 * 0.3 = 0.73843
- b = 0.19143 + 0.1 * 0.3675 = 0.22818
ITERATION 4
- Weighted sum: z = 0.5 * 0.46407 + 0.3 * 0.73843 + 0.22818 = 0.23203 + 0.22153 + 0.22818 = 0.68174
- Output ReLU = 0.68174
- Error = 0.31826
- w1 = 0.46407 + 0.1 * 0.31826 * 0.5 = 0.47999
- w2 = 0.73843 + 0.1 * 0.31826 * 0.3 = 0.74898
- b = 0.22818 + 0.1 * 0.31826 = 0.26001
ITERATION 5
- Weighted sum: z = 0.5 * 0.47999 + 0.3 * 0.74898 + 0.26001 = 0.23999 + 0.22469 + 0.26001 = 0.72469
- Output ReLU = 0.72469
- Error = 0.27531
- w1 = 0.47999 + 0.1 * 0.27531 * 0.5 = 0.49376
- w2 = 0.74898 + 0.1 * 0.27531 * 0.3 = 0.75724
- b = 0.26001 + 0.1 * 0.27531 = 0.28754

The graph above shows how the neuron’s output gradually increases with each iteration, approaching the target value of 1.0. This confirms that the neuron is learning by adjusting its weights and bias using gradient descent.
References:
- McCulloch, W. S., & Pitts, W. (1943). A logical calculus of the ideas immanent in nervous activity.
- Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep Learning. MIT Press.
- Sutton, R. S., & Barto, A. G. (2018). Reinforcement Learning: An Introduction. MIT Press.
ReLU << Previous | Next >> Adam Optimization