ReLU Activation

This page was last edited on 12 November 2025

ReLU stands for “Rectified Linear Unit.” It’s a simple function used in deep neural networks, including Deep Reinforcement Learning (Deep RL).

ReLU decides whether an artificial neuron should be activated or not. If the input is positive, the activation function returns the input. If not, it returns 0.

ReLU adds non-linearity, which is crucial for learning complex patterns.

Why do we use ReLU in Deep RL?

ReLU is fast to compute and works well in practice. It helps deep networks learn faster by avoiding vanishing gradients.

Speed and stability matter in Deep RL application where agents need to learn from delayed and noisy feedback. ReLU allows the network to focus on important signals and ignore weak or negative ones.

When Should You Use ReLU?

Use ReLU when:

  • You need fast computation
  • Your model is not extremely deep
  • You’re building a Deep RL agent (e.g. DQN, PPO)

Avoid ReLU when:

  • You see a lot of dead neurons (0 outputs)
  • Your model needs smooth gradients
  • You train very deep networks with sparse rewards

Comparing ReLU with Other Activation Functions

The goal here is to help readers understand when and why to use ReLU.

ReLU vs. Swish, GELU, ELU, SELU

FunctionMain IdeaProsCons
ReLUmax(0, x)Simple, fast, effectiveDying neurons, not smooth
Swishx * sigmoid(x)Smooth, often better than ReLUSlightly slower
GELUGaussian noise-basedUsed in transformers, smootherComplex, not always better
ELUExponential for x < 0Avoids dead neuronsSlower, sensitive to params
SELUScaled ELU (self-normalizing)Works well in deep nets (with dropout off)Restrictive conditions
  • Swish and GELU perform better in some modern architectures but are more computationally expensive.
  • ELU and SELU can prevent dying neurons and help with internal normalization, but they need careful tuning.

ReLU activation function equation

The equation is:

    \[ \hspace{5mm} \fbox{     \begin{array}{c}         \vspace{5mm} \\         \displaystyle          \text{ReLU}(x) = \max(0, x) \\         \vspace{5mm}     \end{array} } \hspace{5mm} \]

Where:

  • x is the input value (can be from an artificial neuron, layer, or linear function)
  • max picks the larger of 0 or x. If x>0, ReLU returns x. Otherwise, it returns 0.

ANALOGY

We can imagine a water pipe with a one-way valve. Water (input) flows only if the pressure (value) is positive. If there’s no pressure (negative or zero), the valve blocks it.

ReLU is that valve—it only lets positive signals pass through.

HISTORY

ReLU started gaining attention in 2010. It was introduced in the paper “Rectified Linear Units Improve Restricted Boltzmann Machines” by Glorot et al.

It outperformed sigmoid and tanh in deep networks. First widely used in computer vision tasks, then in RL with Deep Q-networks (DQN) by DeepMind in 2015.

Steps to implement ReLU activation function

  • Take the input x.
  • Check if x>0.
  • If yes, return x.
  • If not, return 0.
  • Apply this element-wise to all inputs in a layer.

Initial Inputs (one input per iteration):

  • Iteration 1: x= −4
  • Iteration 2: x= −1.2
  • Iteration 3: x= 0
  • Iteration 4: x= 3.5
  • Iteration 5: x= 6.8

ReLU Calculations:

  • Iteration 1: ReLU(-4)= max(0, -4)= 0
  • Iteration 2: ReLU(-1.2)= max(0, -1.2)= 0
  • Iteration 3: ReLU(0)= max(0, 0)= 0
  • Iteration 4: ReLU(3.5)= max(0, 3.5)= 3.5
  • Iteration 5: ReLU(6.8)= max(0, 6.8)= 6.8

Table of Results:

IterationInput (x)ReLU(x)
1-40
2-1.20
300
43.53.5
56.86.8
ReLU Activation Function
ReLU Activation Function

Since ReLU filters out negative signals, the first three iterations output 0 — input values were negative or zero.

Last two iterations output the input itself — ReLU allowed positive values to pass.

This shows how ReLU filters out negative signals while preserving positive ones. It’s simple but powerful in shaping neural network activations.


References:


Gradient Descent << Previous | Next >> Artificial Neuron