What are Convolutions?

Introduction

After writing my article over how to build a CNN, I forgot that I should've explained what a convolution is, in terms of image recognition. 

What are Convolutions?

They are essentially manners of seeing the probability of a distinct set of figures. Suppose we have 2 lists, 

["parrot","dog","cat"] ; 
["bagel","bread","cheese"]

What is the probability of getting a bagel? 


(3/9, or ~33%)




What is the probability of getting a cat or a bagel? 
(5/9, or ~56%)

This is what a convolution tries to do. 


In the context of image generation, we use a kernel, a matrix to look over every pixel (if you decide to keep it that way) of the image with certain values. The kernel will scan every pixel and then form a new matrice constructed with the scan of the pixels that want to be highlighted. 

 
(It's the number 2)


There is also this thing called stride when it comes to kernels. Stride is used to tell the kernel how many pixels it should move. For example, a stride of 1 will make it move 1 pixel at a time. Two pixels will make it move every two pixels. This means some pixels are skipped. The example above uses a stride of 1. You also have padding, which is to add extra pixels to an image to preserve the dimensions of the image. 

Using convolutional layers instead of, normal, linear layers are very good for image recognition. Instead of having random algorithms looking for random pixels, we have a kernel scanning for something specific. That's why Convolution Layers can be beneficial! They can also be mixed with several different other layers to help your network. 

Works Referenced/Extra Resources

These were the works I referenced while researching for this article. I also included some other resources that I thought would be beneficial for everybody. 

"Convolutional Neural Networks Explained (CNN Visualized)" by Futurology – An Optimistic Future.

"What are convolutional neural networks?" by IBM.

"All about convolutions, kernels, features in CNN" by Abhishek Jain (Medium).

"
CNN | Introduction to Padding" by savyakhosla (accessed Sept 8, 2024, GeeksforGeeks)

Comments

Popular posts from this blog

What is a Multimodal LLM?

Top 3 Breakthroughs in Computer Vision in 2024

A Mathematical Explanation of Gradient Descent