Convolutional Neural Network: Meaning, Equation and Components

This page was last edited on 17 September 2025

A Convolutional Neural Network (CNN) is a deep learning model that extracts features from visual data like images or video frames. In Deep Reinforcement Learning (Deep RL), CNNs help agents “see” the environment. They turn raw pixels into features that the RL agent can understand and learn from.

Why do we use CNNs in Deep RL?

We use CNNs because raw images are too large and complex. A CNN model reduces image size while keeping important info. It learns patterns, edges, shapes—useful for understanding states in the environment. CNN layers help Deep RL agents make better decisions based on visual input.

CNN equation

The core operation is the convolution:

     \[ \hspace{5mm} \fbox{     \begin{array}{c}         \vspace{5mm} \\         \displaystyle          Y(i, j) = \sum_m \sum_n X(i + m, j + n) \cdot K(m, n) \\         \vspace{5mm}     \end{array} } \hspace{5mm} \]

Where:

  • X(i, j): input image pixel at (i, j)
  • K(m, n): kernel value at position (m, n)
  • Y(i, j): result (feature map) at output position (i, j)

Other parameters:

  • Stride – how much the filter moves.
  • Padding – adds borders to control output size.
  • Kernel – also called filter, slides over the image.

Main components of a CNN

  1. Convolutional Layer – does the filtering.
  2. Activation Function (ReLU) – introduces non-linearity.
  3. Pooling Layer – reduces size (e.g., max pooling).
  4. Fully Connected Layer – makes final decision or prediction.
  5. Flatten Layer – turns 2D to 1D before final layers.

ANALOGY

Think of a CNN like looking through a stencil. The stencil (kernel) slides over a photo (input), highlighting specific shapes or lines. Each pass shows a part of the image in a new way—edges, corners, patterns. CNN layers stack these views to understand the whole picture.

HISTORY

CNNs were first proposed in the 1980s. Yann LeCun built one of the first working CNNs in 1989 to recognize handwritten digits (LeNet-5). It was used by banks to read checks. Deep CNNs became popular after 2012, when AlexNet won the ImageNet competition using a convolutional neural network.

Steps to implement a CNN

  1. Define input shape (e.g., 28×28 image).
  2. Add a convolutional layer with filters.
  3. Add ReLU activation.
  4. Add pooling (optional).
  5. Repeat conv + ReLU + pooling if needed.
  6. Flatten the result.
  7. Add fully connected (dense) layers.
  8. Output prediction.

Concepts behind CNNs

  • Local receptive fields – each neuron sees a part of the image.
  • Weight sharing – same filter applied across the image.
  • Translation invariance – small shifts don’t change output much.
  • Hierarchical feature extraction – deeper layers learn more abstract patterns.

Input and output of CNN

In Deep RL, CNN transforms visual input into a meaningful state representation that is fed into the RL algorithm (like DQN, A3C, PPO).


References:


Adam Optimization << Previous | Next >> Q-Learning