Deep Learning :: SAO Blog

Visual Language Models, with PaliGemma as a Case Study

May 22, 2025, 02:35 PM 45 min read

Thanks to Umar Jamil’s excellent video tutorial. Vision-language models can be grouped into four categories; this post uses PaliGemma to unpack VLM architecture and implementation details.

Deep LearningMultimodal

From RL to RLHF

May 8, 2025, 02:15 PM 50 min read

This article is primarily based on Umar Jamil's course for learning and recording purposes. Our goal is to align LLM behavior with our desired outputs, and RLHF is one of the most famous techniques for this.

Deep LearningRLHFLLM

The Intuition and Mathematics of Diffusion

Dec 13, 2024, 10:02 AM 40 min read

Deeply understand the intuitive principles and mathematical derivations of diffusion models, from the forward process to the reverse process, mastering the core ideas and implementation details of DDPM.

Deep LearningDiffusion

Let's build AlphaZero

Nov 26, 2024, 02:07 PM 35 min read

Starting from the design principles of AlphaGo and diving deep into the core mechanisms of MCTS and Self-Play, we reveal step-by-step how to build an AI Gomoku system that can surpass human capabilities.

Deep LearningReinforcement LearningMCTSSelf-Play

PPO Speedrun

Nov 14, 2024, 07:31 AM 25 min read

Quickly understand the core ideas and implementation details of the PPO (Proximal Policy Optimization) algorithm, and master this important method in modern reinforcement learning.

RLPPODeep Learning

Introduction to Knowledge Distillation

Nov 3, 2024, 02:56 PM 35 min read

Learn the basic principles of Knowledge Distillation and how to transfer knowledge from large models (teachers) to small models (students) for model compression and acceleration.

Deep LearningKnowledge Distillation

Vector Add in Triton

Sep 19, 2024, 03:06 PM 20 min read

Starting from simple vector addition, learn how to write Triton kernels and explore performance tuning techniques.

TritonDeep LearningAI

Softmax in OpenAI Triton

Sep 14, 2024, 05:41 PM 30 min read

Learn how to write efficient GPU kernels using OpenAI Triton, implementing the Softmax operation and understanding Triton's programming model.

TritonDeep LearningPython

History of LLM Evolution (5): Building the Path of Self-Attention — The Future of Language Models from Transformer to GPT

Mar 20, 2024, 08:49 AM 60 min read

Building the Transformer architecture from scratch, deeply understanding core components like self-attention, multi-head attention, residual connections, and layer normalization.

LLMGPTDeep LearningTransformer

History of LLM Evolution (4): WaveNet — Convolutional Innovation in Sequence Models

Mar 9, 2024, 04:01 PM 30 min read

Learn the progressive fusion concept of WaveNet and implement a hierarchical tree structure to build deeper language models.

AIDeep LearningLLM

History of LLM Evolution (3): Batch Normalization — Statistical Harmony of Activations and Gradients

Feb 29, 2024, 03:44 PM 35 min read

Deeply understand the activation and gradient issues in neural network training, and learn how batch normalization solves the training challenges of deep networks.

Deep LearningAI

History of LLM Evolution (2): Embeddings — MLPs and Deep Language Connections

Feb 17, 2024, 09:48 PM 25 min read

Exploring Bengio's classic paper to understand how neural networks learn distributed representations of words and how to build a Neural Probabilistic Language Model (NPLM).

AILLMDeep LearningEmbeddingsNeural Networks

History of LLM Evolution (1): The Simplicity of Bigram

Feb 17, 2024, 11:05 AM 20 min read

Starting with the simplest Bigram model to explore the foundations of language modeling. Learn how to predict the next character through counting and probability distributions, and how to achieve the same effect using a neural network framework.

AIDeep LearningLLMLanguage Models

Building a Minimal Autograd Framework from Scratch

Feb 16, 2024, 10:28 AM 25 min read

Learning from Andrej Karpathy's micrograd project, we build an automatic differentiation framework from scratch to deeply understand the core principles of backpropagation and the chain rule.

Deep LearningAIPyTorchAutogradNeural Networks

#Deep Learning