Generalized Advantage Estimation (GAE)
Background From our last post, we derived the basic policy gradient method. However, the basic poliocy gradient method can really suffer from high variance i...
Background From our last post, we derived the basic policy gradient method. However, the basic poliocy gradient method can really suffer from high variance i...
Background Policy gradient is one of the methods in reinforcement learning that directly computes the gradient of the objective function and uses it to updat...
Background One of the basic ideas of imitation learning is through behavioral cloning. However, behavioral cloning is not guaranteed to work well. The behavi...
This post is written after watching Andrej Karpathy’s Lst’s reproduce GPT-2 (124M) video. I followed along most of the acceleration techniques on Windows pla...
This post shows the basic derivation from the traditional softmax, to safe softmax and online safe softmax. The idea was first proposed by the engineers from...
Watching the nanoGPT series by Andrej Karpathy was really inspiring and the videos provided a really nice introduction to reproduce GPT-2 using the transform...
Some notes for basic makefile usage.