How a "GPT" Works
What is a GPT?
In Artificial Intelligence, a GPT stands for "Generative Pre-Trained Transformer." This means that the GPT (called the model) is able to "synthesize" an output to a user input because it was given data previously to be trained on, and finds patterns hidden within it. The word transformer has a little more backstory that we need to dive deep into.
Most say that the idea of a transformer was first introduced in 2017, in an article titled "Attention Is All You Need" by 8 Google Scientists. This article revolutionized a new way of thinking of training models: allowing them to dynamically focus on important input elements, called the attention mechanism (Vaswani, Ashish et al., 2017). In a modern Transformer Network (at least the one I will explain in this article), the last token is used to determine the next token/output. This token is modified in value, using the context given by the user in previous interactions and the prompt itself.
*credit to 3Blue1Brown, I will be using his explanation of how a Decoder only Transformer works.
1) Token Assigning: A Body of Text is broken down into what are called “tokens” (sometimes words, parts of words, a few pixels on a screen, etc.). Here's an example that we will use,
|I| |love| |a| |lot| |of| |cho| |co| |late| |.| |Th||is| |me||ans| |I| |li||ke|(Notice how "chocolate" is broken down into several chunks)
Each Token is then assigned a vector. For example, the token “of” could be assigned,
Credit: Desmos 3D, used it to model a vector.
4) Doing this for all tokens. The Network does this for everything.
Why use them? Where are they used in the real world?
GPTs can be very useful when it comes to just getting a simple answer. Most people use ChatGPT by OpenAI in order to be more efficient, find a simpler and cleaner explanation than what the internet could offer, learning from it, and much more. Some other examples are Microsoft Copilot and Gemini by Google.
Fun Fact: Siri by Apple and Google Translate are AI Models!
In a broader sense, Transformers are used to help create coherent sentences that previous Machine Learning models could not do. This field is called Natural Language Processing, where they attempt to make AI understand and retain what it reads from humans and write it as well.
Therefore, it is typically used for tasks that require a lot of context clues and large inputs, boosting speed by focusing on the important key tokens (like humans do). In the real world, transformers can be used for machine translation (using a machine to translate something from one language to another), sentiment analysis, analyze input data, and much more.
Why shouldn't you use them? What are the drawbacks?
When it comes to visual processing, transformers in general are not your friend. Transformers need information to be in specific, structured, sequences of data. Sure, Visual Transformers can provide this, but they aren't extremely effective compared to other models. They are also not good when you only have a limited data set and can only make a small model just like a human (we need a bunch of "training" in order to gain context into something). Transformers always need large amounts of data to be trained with, which makes them very good for big scale projects.
There are several other types of networks, such as Convolutional Neural Networks (CNNs are used with image processing), Recurrent Neural Networks (RNNs are used with smaller data size and smaller model needs), Kolmogorov-Arnold Networks (KANs are pretty brand new and are an alternative to Multi-Layer Perceptrons), Neocognitron (used for pattern recognition tests and inspired the CNN), and many more.
*Note: CNNs and RNNs don't have a "single pin-point" to who created it, so I linked IBM articles that talk about each one in-depth.
Comments
Post a Comment