Using PyTorch to create a Convolutional Neural Network for Image Classification of Numbers

September 01, 2024

Introduction

Just a few notes before I get started:
1) Be on the lookout for overfitting in your models. Do not use to many layers, because when you eventually test it, it will be so tuned so much to your trained dataset that it will actually provide the wrong answer during your test (personal experience).

2) .to('cuda') vs .to('cpu'): if you have a compatible NVDIA graphics card with the CUDA drivers downloaded, you should check if you can use this function before proceeding with my code. If you don't have an NVDIA graphics card or don't want to, you can use ".to('cpu')" to use your CPU instead.

3) I am using the MNIST Number Dataset on Google Colab with a T4 GPU and have tested it on a RTX 3060. You can use ".to('cuda')" for either.

Framing + Implementation

Our goal is to get the computer to understand integers 0-9. To do this, we will use a Convolutional Neural Network. Personally, I will be using the PyTorch library to accomplish this, since it's simple and already has an inbuilt function to download this dataset.

First, import these statements. We will use them for the entire project,

Next, lets load our testing data in,

*Note, transforming to tensor is very crucial to making the program run. I used 32 images for the batch size, or the amount of data that will be processed in one iteration in the network.

Next, we have to create our Neural Network, specifically using convolutions. Create a class,

Next, we're going to build the real network. We will use a sequential model to build our neural network, saving it under self.inital_model. Here's what I mean,

This container will contain the model, holding the Convolutional Layers, flattening them, and converting the image, and applying the ReLU activation function. Here's how our first layer will look like,

The last part is what we use to transform the image. The reason we subtract 2 pixels is because each layer, we shave 2 pixels in this situation. Let's create 2 more layers (for a total of 3),

Next is the forward method, which will make sure the images pass through the Neural Network.

*Note: There is an indention error for the forward method. Make sure you indent it one more time.

We're going to print the model's features.

You should expect these results,

We now need to build our backpropagation portion of the network. We, first, will use the Adam Optimizer (it's really good for learning and helping fine-tune the parameters) and the Cross Entropy Loss (classify probability values between 0-1).

LR stands for "Learning rate," and "1e" (I think), stands for 10^1. 3 is subtracted after that ([10^1]-3).

Next, we're going to create our training flow to, obviously, train our model. We are going to store this in a file called "model_state.pt". I will be using 5 epochs

Once this is all done, make sure to run your code! I'm going to go ahead and test this network. My number is going to be number 2.

It's right! Congrats, you just learned to make your first CNN!!!

A Beginner's look into Artificial Intelligence

Using PyTorch to create a Convolutional Neural Network for Image Classification of Numbers

Introduction

Framing + Implementation

Comments

Post a Comment

Popular posts from this blog

What is a Multimodal LLM?

A Brief Introduction into Model Quantization

A Mathematical Explanation of Gradient Descent