Summary of LightOn AI meetup #14

WeightWatcher a Diagnostic Tool for Deep Neural Networks

It is about the one-year mark since we started our (virtual) LightOn AI Meetups, and to mark the anniversary 🥳, we had Charles Martin as a guest. Charles is the Chief Scientist at Calculation Consulting and he presented his work on WeightWatcher: a Diagnostic Tool for Deep Neural Networks, a Python package built around a series of papers.

The 📺 recording of the meetup is on LightOn’s Youtube channel. Subscribe to the channel and subscribe to our Meetup to get notified of the next videos and events!

Weightwatcher is a Python package dedicated to analyze trained models and inspect models that are difficult to train🏋️. It can be used to gauge improvements in model performance and predict test accuracies across different models 🔮(without ever looking at the data!). It can also detect potential problems when compressing or fine-tuning pre-trained models 🗜️.

It is based on ideas from Random Matrix Theory, Statistical Mechanics, and Strongly Correlated Systems. The main idea is to fit a power law to the tail of the empirical spectral density (ESD) of the layer weights. The power-law exponent α is what helps us detect potential problems.

Fitting a power law in log-log to the tail of ESD needs to be done carefully!Fitting a power law in log-log to the tail of ESD needs to be done carefully!

Poorly trained models tend to have large layer α, as can be seen for example comparing GPT and GPT-2: the same model trained on dirty versus well-curated data.

GPT is trained on dirtier data than GPT-2, and it shows in the unusually large α values for some of the layers.GPT is trained on dirtier data than GPT-2, and it shows in the unusually large α values for some of the layers.

In particular, a weighted α can predict the test accuracy for models in the same architecture series across varying depths and other architectures and regularization parameters 📉.

The correlation between test accuracy and weighted alpha is remarkable.The correlation between test accuracy and weighted alpha is remarkable.

Finally, there is some early research to extend this idea on when to perform optimal early stopping 🛑, or per-layer learning rate settings 🎛️, or detect over-fitting 🔍. Quite a program! We look forward to even more insightful empirical metrics in Charles’ WeightWatcher in the future. The video of the meetup is here.

dans Blog
Se connecter pour laisser un commentaire.