Optimizing the Learning Rate of your Neural Networks

If I have learned anything from watching online lectures about gradient descent, the worse your drawings are, the more likely you are to be a math professor.
Source Fastai course chapter 5, the loss steadily decreases then rapidly rises

Discriminative Learning Rates

A discriminative learning rate is when you train a neural net with different learning rates for different layers. As far I can tell the main application for these is when you are fine-tuning a pre-trained model for your own classifier. A common example is using something like resnet34 which is trained on a huge dataset and adapting it for your own needs. In doing so, rather than have the last layer be a classification it is randomly generated then the weights are learned over time. When doing a task like this, different layers will need different learning rates.

Conclusions

Learning rates are an important part of optimizing a neural net efficiently. Recently very effective methods have been developed for doing so, some simpler and requiring more intuition while others are automatic but complicated to implememnt. Neural networks also benefit from different learning rates for different layers and epochs.

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store