How a "GPT" Works
What is a GPT? In Artificial Intelligence, a GPT stands for "Generative Pre-Trained Transformer." This means that the GPT (called the model) is able to "synthesize" an output to a user input because it was given data previously to be trained on, and finds patterns hidden within it. The word transformer has a little more backstory that we need to dive deep into. Most say that the idea of a transformer was first introduced in 2017, in an article titled "Attention Is All You Need" by 8 Google Scientists. This article revolutionized a new way of thinking of training models: allowing them to dynamically focus on important input elements, called the attention mechanism (Vaswani, Ashish et al., 2017). In a modern Transformer Network (at least the one I will explain in this article), the last token is used to determine the next token/output. This token is modified in value, using the context given by the user in previous interactions and the prompt itself. Steps i...