nanoGPT-training
This post is written after watching Andrej Karpathy’s Lst’s reproduce GPT-2 (124M) video. I followed along most of the acceleration techniques on Windows pla...
This post is written after watching Andrej Karpathy’s Lst’s reproduce GPT-2 (124M) video. I followed along most of the acceleration techniques on Windows pla...
This post shows the basic derivation from the traditional softmax, to safe softmax and online safe softmax. The idea was first proposed by the engineers from...
Watching the nanoGPT series by Andrej Karpathy was really inspiring and the videos provided a really nice introduction to reproduce GPT-2 using the transform...
Background Policy gradient is one of the methods in reinforcement learning that directly computes the gradient of the objective function and uses it to updat...
Background One of the basic ideas of imitation learning is through behavioral cloning. However, behavioral cloning is not guaranteed to work well. The behavi...
Some notes for basic makefile usage.