Posts

Showing posts from August, 2024

How a "GPT" Works

Image
What is a GPT? In Artificial Intelligence, a GPT stands for "Generative Pre-Trained Transformer." This means that the GPT (called the model) is able to "synthesize" an output to a user input because it was given data previously to be trained on, and finds patterns hidden within it. The word transformer has a little more backstory that we need to dive deep into. Most say that the idea of a transformer was first introduced in 2017, in an article titled "Attention Is All You Need" by 8 Google Scientists. This article revolutionized a new way of thinking of training models: allowing them to dynamically focus on important input elements, called the attention mechanism (Vaswani, Ashish et al., 2017). In a modern Transformer Network (at least the one I will explain in this article), the last token is used to determine the next token/output. This token is modified in value, using the context given by the user in previous interactions and the prompt itself. Steps i...

What is SoftMax and ArgMax?

Image
What is a SoftMax? First created in 1868 by physicist Ludwig Boltzmann, the term was yet to be called the SoftMax function. Its initial name was the Boltzmann or Gibbs Distribution, theorized for Gas Theory: a theory in physics, as you guessed, is about gas. It has later become more useful for statistics and neural networks, where its purpose is to redefine the probability distribution, which tells the probability from 0-1 of a set of tokens being the next possible one. In some cases, the probability distribution could be:  like = 10.01 love = 7.4 hate = 5.1 distaste = 3.7  (Assume for our test case that the tokens are full words) As you noticed, the tokens are not between 0 and 1. In our case, this is not very helpful or useful since when numbers add up together, they do not equal a value that is constant for every probability distribution created. This makes it hard to determine the real possibility for the next word. That's where the SoftMax function comes in,  x ...

What and how does Deep Learning Work?

Image
What is Deep Learning? By technicality, Deep Learning (DL) is a branch of Machine Learning (ML) which is a branch of Artificial Intelligence that has 7 different, but not exactly distinct categories. Each category in fact intertwines with each other in order to get the desired outcome. Deep Learning is a category in which a computer attempts to find a pattern within a set of data that a human could not find to sort out the information into the desired outputs.      One example of how people separate the AI field. Credit: Javatpoint In order to achieve such a result, computers use what is formerly called an artificial neural network: a method that resembles how the human brain functions using artificial neurons. An artificial neural network uses interconnected sets of these neurons to sort out information to correctly designate it to some value. There are, at minimum, 3 sets (layers) in the Neural Network: 1) Input Layer: This is where the input goes.  a) User-...