Posts

Showing posts from June, 2025

A Mathematical Explanation of Gradient Descent

Image
Example of a close-to-optimal Gradient descent for a quadratic function. Credit: Dylan Introduction Although I wrote an article recently explaining Gradient Descent, I feel like a mathematical explanation of how it works would be beneficial not only to understand how it works, but also how you derive the functions.  Explanation First, we need a way to calculate how far off the model is. For our example, we will use the L2 Loss Function,  Let N represent the total amount of data points we have, i the iteration we are on, yᵢ represent the expected output, and ŷ represent the output predicted by the model.  Let's look more into ŷ. Since ŷ is the predicted value, this is the function that needs to have their parameters changed. Our goal is to change these parameters enough to minimize the loss of ŷ. For our example, let's set ŷ to be a linear line.  The values in "a" and "b" are our parameters. These will be the values we will have to change within f(x) to chang...