Artificial Neuron: The Tiny Decision Maker

This page was last edited on 12 November 2025

An artificial neuron is the basic unit in a neural network. It takes multiple inputs, applies a weight to each input, sums them up, adds a bias, and passes the result through an activation function. In Deep Reinforcement Learning (Deep RL), it’s used inside the neural networks that approximate functions like policies, value functions, or Q-values.

Why do we use artificial neurons in Deep RL?

We use them because they allow us to model complex, nonlinear relationships between input states and outputs (like actions or Q-values). Classical RL methods (like tabular Q-learning) don’t scale to large or continuous state spaces. Artificial neurons allow us to generalize and learn from raw input like images, sensor data, or multi-dimensional states.

Equation of the artificial neuron

The basic equation:

    \[ \hspace{5mm} \fbox{     \begin{array}{c}         \vspace{5mm} \\         \displaystyle          \text{output} = f \left( \sum_{i=1}^{n} w_i \cdot x_i + b \right) \\         \vspace{5mm}     \end{array} } \hspace{5mm} \]

Where:

  • xi: input values (e.g., state features)
  • wi: weights applied to each input
  • b: bias (a constant that shifts the output)
  • f: activation function (e.g., ReLU, sigmoid, tanh)
  • output: neuron output, sent to the next layer or interpreted as a prediction

ANALOGY

Think of a chef tasting ingredients. Each ingredient (input) has a certain quantity and taste (weight). The chef combines them, adds salt (bias), and decides if the flavor is good (activation function). If the combined flavor is strong enough, the dish is accepted. If not, it’s rejected.

HISTORY

The idea started with the McCulloch-Pitts neuron in 1943 — a simplified model of how real neurons work. It was used to simulate logical functions. Later, in the 1950s, the perceptron was developed as the first trainable neural model. But due to limitations, the field slowed down until the 1980s when multilayer networks and backpropagation brought new energy. Artificial neurons became essential in modern AI and Deep RL.

We define:

  • Inputs: x = [0.5, 0.3]
  • Weights: w = [0.4, 0.7]
  • Bias: b = 0.1
  • Activation function: ReLU (f(z) = max(0, z))
  • Learning rate: 0.1
  • Target output: 1.0

We calculate the error:

error = target – output

ITERATION 1

  • Weighted sum: z= 0.5 * 0.4 + 0.3 * 0.7 + 0.1= 0.2 + 0.21 + 0.1 = 0.51
  • Output ReLU = 0.51
  • Error = 1.0 – 0.51 = 0.49
  • Update:
    • w_1 = 0.4 + 0.1 * 0.49 * 0.5 = 0.4245
    • w_2 = 0.7 + 0.1 * 0.49 * 0.3 = 0.7147
    • b = 0.1 + 0.1 * 0.49 = 0.149

ITERATION 2

  • Weighted sum: z= 0.50.4245 + 0.30.7147 + 0.149 = 0.21225 + 0.21441 + 0.149 = 0.57566
  • Output ReLU = 0.57566
  • Error = 0.42434
  • w1 = 0.4245 + 0.1 * 0.42434 * 0.5 = 0.4457
  • w2 = 0.7147 + 0.1 * 0.42434 * 0.3 = 0.7274
  • b = 0.149 + 0.1 * 0.42434 = 0.19143

ITERATION 3

  • Weighted sum: z = 0.5 * 0.4457 + 0.3 * 0.7274 + 0.19143 = 0.22285 + 0.21822 + 0.19143 = 0.6325
  • Output ReLU = 0.6325
  • Error = 0.3675
  • w1 = 0.4457 + 0.1 * 0.3675 * 0.5 = 0.46407
  • w2 = 0.7274 + 0.1 * 0.3675 * 0.3 = 0.73843
  • b = 0.19143 + 0.1 * 0.3675 = 0.22818

ITERATION 4

  • Weighted sum: z = 0.5 * 0.46407 + 0.3 * 0.73843 + 0.22818 = 0.23203 + 0.22153 + 0.22818 = 0.68174
  • Output ReLU = 0.68174
  • Error = 0.31826
  • w1 = 0.46407 + 0.1 * 0.31826 * 0.5 = 0.47999
  • w2 = 0.73843 + 0.1 * 0.31826 * 0.3 = 0.74898
  • b = 0.22818 + 0.1 * 0.31826 = 0.26001

ITERATION 5

  • Weighted sum: z = 0.5 * 0.47999 + 0.3 * 0.74898 + 0.26001 = 0.23999 + 0.22469 + 0.26001 = 0.72469
  • Output ReLU = 0.72469
  • Error = 0.27531
  • w1 = 0.47999 + 0.1 * 0.27531 * 0.5 = 0.49376
  • w2 = 0.74898 + 0.1 * 0.27531 * 0.3 = 0.75724
  • b = 0.26001 + 0.1 * 0.27531 = 0.28754

Neuron Output Over 5 Iterations
Neuron Output Over 5 Iterations

The graph above shows how the neuron’s output gradually increases with each iteration, approaching the target value of 1.0. This confirms that the neuron is learning by adjusting its weights and bias using gradient descent.


References:


ReLU << Previous | Next >> Adam Optimization