Posts by Year

2025

Policy Gradient

1 minute read

Background Policy gradient is one of the methods in reinforcement learning that directly computes the gradient of the objective function and uses it to updat...

Why behavioral cloning is not enough

3 minute read

Background One of the basic ideas of imitation learning is through behavioral cloning. However, behavioral cloning is not guaranteed to work well. The behavi...

nanoGPT-training

4 minute read

This post is written after watching Andrej Karpathy’s Lst’s reproduce GPT-2 (124M) video. I followed along most of the acceleration techniques on Windows pla...

Online Softmax

2 minute read

This post shows the basic derivation from the traditional softmax, to safe softmax and online safe softmax. The idea was first proposed by the engineers from...

nanoGPT-Models

5 minute read

Watching the nanoGPT series by Andrej Karpathy was really inspiring and the videos provided a really nice introduction to reproduce GPT-2 using the transform...

Back to Top ↑