🚧 Under Construction 🚧

This post serves as a single, one-stop-shop for the ML research I have read and want to read. The resources are organized into categories, and preceded with a check mark (✔️) if I have read them. I also add a short blurb about my thoughts on the paper.

Fundamentals

(In the voice of an 60 year old basketball coach) “…fundamentals.”

I have been very vocal about my support for this text. It is intuitive without shying away from the mathematics, and it enabled me to write both the forward and backward pass of an ANN entirely from scratch.

I read this textbook so I could make a bot made in vanilla Python that learns to play the mobile game 2048. The next summer the authors won the Turing Award!

Seminal Architectures

Modern deep learning does have it’s superstars.

I love variational auto-encoders. They seem so… right. When I programmed one from scratch, I had to sit down and compute the gradients from scratch. That was a pain.

TO READ No list would be complete without the paper to rule them all. Probably the only things that will even read this blog are transformer-based language models.

Text-based image generation is the reason I study deep learning. After reading this paper, I was able to make a latent diffusion model, using my previously made from-scratch neural network and VAE. Training on CPU-only was brutal.

TO READ

Language Modeling

Language modeling is both the most promising subfield of AI, as well as the most industry-relevant.

This served as my first proper introduction to modern language modeling. It was a long read, but well worth it. All the different design choices inspired me to choose my own and try it out. A post on this effort is soon coming.

It’s hard to understate how influential this paper was. Unhobbling large language models by providing a chain of thought turned out to be a whole new paradigm in scaling intelligence.

This paper sorta convinced me there has got to be something better than the transformer if it’s just throwing most of the attention into the first token. It just seems very… hacky.

When I read this paper I was like ohhhhhhhhh now I get how these things work. Finally.

TO READ

Training Theory and Improvements

I am particularly interested in the empirical laws that govern neural network training. Theory without practice something something.

First, why would we want to train larger models? Answer: compute and search win. I still have professors that haven’t learned the bitter lesson.

If you have ever sat through an introductory machine learning course, you have seen the overfitting diagram with polynomial regression. It tells such a simple story, and yet it is so obviously wrong in the modern era. If this is true, then why are we training models with trillions of parameters? I gave a full presentation on this paper to the NU AI Journal Club for just that reason.

Deep learning is just complicated bogosort.

One of the most impressive things in all of machine learning is OpenAI predicting the final performance of a model that trained for months. This paper was critical when my friend Dan and I created our own language model.

The interlinking of double descent, scaling laws, and model capacity was fascinating, and the first page plots perfectly conveyed the results of the paper.

The word Grok is ruined now, but it was nice while it lasted.

TO READ

How do I train my model?

The actual training of neural networks requires more than just the gradients. So many of deep learnings modifications come from the ability to properly train larger models.

From what I know, it’s been unchallenged for over a decade. I implemented this one myself, and wasn’t able to get my VAE to learn anything without it.

TO READ

TO READ

TO READ

A really good machine learning paper is as simple as it is effective.

TO READ

TO READ

TO READ

Improvements to the Classics

This section contains various improvements to architectures.

TO READ

TO READ

Specific Models

Some models get all the attention.

TO READ

TO READ

My original inspiration for learning ML was the success of self-play algorithms. My friend Dan and I replicated this paper with tictactoe in grad school.

TO READ

Alternative Architectures

Neural networks are amazing, but there are other architectures out there. Can anything do better?

This pair of papers is one of the most exciting lines of research I’ve read. Even with the huge cost to training time, the insane speed of inference should catch any researcher’s eye. I plan on making an animation for LGN learning because I think it will look cool.

Safety & the Future

The future is in our (really just ~500 people that aren’t me) hands.

Instrumental convergence is such an important idea that I wrote a whole blog post on it!

TO READ

TO READ

Biology and Evolution

Neural networks are loosely (loosely) based on the brain. What about algorithms that are more closely linked with brain function?

This is another very exciting direction of research. The global updates of backprop do feel very artificial, and PC’s focus on local updates is very intriguing.

TO READ

Miscellaneous

And finally the papers that don’t fit.

TO READ

Other resources

Research papers are great, but there are many ways to get information on the concepts and pace of the field.

TO WRITE

This is great for planning your next decoder-only transformer!