banner
Nagi-ovo

Nagi-ovo

Breezing
github
cover
cover
cover
cover
cover

Let's build AlphaZero

本文是对于 Sunrise:从头理解 AlphaZero,MCTS,Self-Play,UCB 等文章、视频教程和代码实现的消化与理解。 本文将从 AlphaGo 的设计原理出发,通过深入理解 MCTS 和 Self-Play 这两个核心机制,逐步揭示如何构建一个能超越人类的…
cover
cover

“速通” PPO

Proximal Policy Optimization 终于到了这几年 NLP 领域中比较火热的 RL 算法之一了 On-Policy 算法中,采集数据用的策略和训练的策略是相同的,这样的问题是数据用一次后就得丢弃,然后再重新采集数据,训练速度很慢。 PPO 背后的直觉  …
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover

知识蒸馏入门学习

本文将尝试结合: 入门 Demo:Knowledge Distillation Tutorial — PyTorch Tutorials 进阶学习:MIT 6.5940 Fall 2024 TinyML and Efficient Deep Learning Computing…
cover
cover
cover
cover

破解 Follow 邀请码的历程

上水课逛 Follow 的 Discord 想撞个邀请码玩玩,奈何手速太慢,直接送的或者只 mask 一位数的等我看到的时候基本就没了。不过这时看到了下面的一道谜题: 文件链接 扫码结果是:邀请码藏在图片 “里” OK,成功激起挑战欲🤓 虽说明明知道不太可能是简单的在视觉…
cover
cover
cover
cover
cover
cover
cover
cover

Actor Critic 方法初探

方差问题 策略梯度(Policy Gradient)方法因其直观和有效性而备受关注。我们之前探讨过Reinforce算法,它在许多任务中表现良好。然而,Reinforce 方法依赖于蒙特卡洛(Monte Carlo)采样来估计回报,这意味着我们需要使用整个回合的数据来计算回报…
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover

从 DQN 到 Policy Gradient

复习 Q-Learning 是一种用于训练 Q 函数的算法,该action-value 函数决定了在特定状态下采取某一特定动作的价值。通过维护 Q 表来保存所有state-action pair value 的记忆。 对于像《Space Invaders》这样的 Atari 游戏…
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover

强化学习基础与Q-Learning

今年打 Kaggle 比赛用了 DeepSeek-Math-7B-RL 模型,学习时把 Claude 3.5 Sonnet 当作老师,这两个模型强大的原因都离不开 RL。隐约感觉这个领域的技术很强很美于是准备接触一下,奈何功底不扎实不好,看不懂 OpenAI Spinning…
cover
cover
cover
cover
cover
cover
cover
cover

LoRA in PyTorch

本文是对 GitHub - hkproj/pytorch-lora学习的总结。 以前用过很多次 peft 库的 LoRA 微调,知道大概原理但没动手实现过,因此这个课程内容很戳我。ADHD 经典不消化掉知识就难受 Fine-Tuning 对象:预训练模型 目的…
cover
cover
cover
cover
cover
cover

Vector Add in Triton

单线程版本 逐元素相加: Triton 实现 在 Triton 中,向量加法内核通过将向量划分为多个块(blocks),并在每个 Grid 中的线程(threads)并行计算,实现高效的向量加法操作。每个线程负责加载两个向量中对应位置的元素,进行相加并存储结果。 核心步骤…
cover
cover
cover
cover
cover

Softmax in OpenAI Triton

本文是对 @sotadeeplearningtutorials9598 的 Youtube 教程学习的总结,感谢老师深入浅出的指导让我这个从未接触过 GPU 编程的小白能够编写出第一个有实际效果的 Kernel。 Softmax 是一种常用的激活函数…
cover
cover
cover
cover

Policy Gradient 入门学习

本文是对学习 Andrej Karpathy 的 Deep RL Bootcamp 及其博客的记录,博客链接:Deep Reinforcement Learning: Pong from Pixels RL 的进展并不主要由新奇惊人的想法推动: 2012 年的…
cover

WSL2 配置 Ubuntu20.04

买了新电脑(3090ti)用来炼丹,犹豫了许久后在双系统和 WSL 中选择了后者。 原因如下: 不想折腾双系统的硬盘、网络配置,而 WSL 可以同步主系统的 host 尝鲜 WSL2,知道这东西好久了,但之前的用的拯救者笔记本跑起来很卡,没当生产力用过 存在感低,2-3…
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover

LLM 演进史(六):揭开 Tokenizer 的神秘面纱

Tokenizer 是 LLM 中很重要但又没那么 fancy 的组件,在本系列之前的语言模型建模中,tokenizer 的实现方式是字符级的,将所有可能出现的 65 种字符制作嵌入表,然后用 embedding layer 对训练集进行编码向量化。而实践中…
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover

LLM演进史(五):构筑自注意力之路——从Transformer到GPT的语言模型未来

前置知识:前面的 micrograd、makemore 系列课程(可选),熟悉 Python,微积分和统计学的基本概念 目标:理解和欣赏 GPT 的工作原理 你可能需要的资料: Colab Notebook 地址 Twitter 上看到的一份很细致的笔记,比我写得好 在…
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover

微调之道

选择 LLM 完成一个 NLP 任务,如何下手? 从下图中就能很好的明白哪个操作适合完成你当前的任务: 如果你有时间和大量数据,你完全可以重新训练模型;一定量的数据,可以对预训练模型进行微调;数据不多,最好的选择是 “in context learning”,上下文学习…
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover

LLM演进史(四):WaveNet——序列模型的卷积革新

本节内容的源代码仓库。 我们在前面的部分搭建了一个多层感知机字符级的语言模型,现在是时候把它的结构变的更复杂了。现在的目标是,输入序列能够输入更多字符,而不是现在的 3 个。除此之外,我们不想把它们都放到一个隐藏层中,避免压缩太多信息。这样得到一个类似WaveNet的更深的模型。…
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover

LLM演进史(三):批归一化——激活与梯度的统计调和

本节的重点在于,要对于训练过程中神经网络的激活,特别是向下流动的梯度有深刻的印象和理解。理解这些结构的发展历史是很重要的,因为 RNN (循环神经网络),作为一个通用逼近器 (universal approximator),它原则上可以实现所有的算法…
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover

GPT的现状

本文是对 Andrej Karpathy 的在 2023 年 3 月份的 Microsoft Build 演讲的整理。 演讲 Beamer 可见于:https://karpathy.ai/stateofgpt.pdf 演讲介绍了 GPT 的训练过程,发展地步,当前的 LLM…
Ownership of this blog data is guaranteed by blockchain and smart contracts to the creator alone.