Optimizing the Learning Rate of your Neural Networks

The learning rate is one of those pesky hyperparameters that as data scientists it’s our job to fine tune. It is the rate at which the weights in a model are updated based on the current error. The loss function is convex and for our purposes here continuous (If the loss function is not continuous then you can run into local minima that cause all sorts of problems but that’s a story for another day). In other words if we tried every possible weight a parameter then plotted the loss at each point it would look something like this: