Posts by Year

Generalized Advantage Estimation (GAE)

3 minute read

Background From our last post, we derived the basic policy gradient method. However, the basic poliocy gradient method can really suffer from high variance i...

Policy Gradient

1 minute read

Background Policy gradient is one of the methods in reinforcement learning that directly computes the gradient of the objective function and uses it to updat...

Why behavioral cloning is not enough

3 minute read

Background One of the basic ideas of imitation learning is through behavioral cloning. However, behavioral cloning is not guaranteed to work well. The behavi...

nanoGPT-training

4 minute read

This post is written after watching Andrej Karpathy’s Lst’s reproduce GPT-2 (124M) video. I followed along most of the acceleration techniques on Windows pla...

Online Softmax

2 minute read

This post shows the basic derivation from the traditional softmax, to safe softmax and online safe softmax. The idea was first proposed by the engineers from...

nanoGPT-Models

5 minute read

Watching the nanoGPT series by Andrej Karpathy was really inspiring and the videos provided a really nice introduction to reproduce GPT-2 using the transform...

Makefile Basics

3 minute read

Some notes for basic makefile usage.

Tianlong Sun

Posts by Year

2025

Generalized Advantage Estimation (GAE)

Policy Gradient

Why behavioral cloning is not enough

nanoGPT-training

Online Softmax

nanoGPT-Models

Makefile Basics