banner
Nagi-ovo

Nagi-ovo

Breezing homepage: [nagi.fun](nagi.fun)
github

AI

cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover

强化学习基础与Q-Learning

今年打 Kaggle 比赛用了 DeepSeek-Math-7B-RL 模型,学习时把 Claude 3.5 Sonnet 当作老师,这两个模型强大的原因都离不开 RL。隐约感觉这个领域的技术很强很美于是准备接触一下,奈何功底不扎实不好,看不懂 OpenAI Spinning…
cover
cover
cover
cover
cover
cover

Vector Add in Triton

单线程版本 逐元素相加: Triton 实现 在 Triton 中,向量加法内核通过将向量划分为多个块(blocks),并在每个 Grid 中的线程(threads)并行计算,实现高效的向量加法操作。每个线程负责加载两个向量中对应位置的元素,进行相加并存储结果。 核心步骤…
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover

LLM 演进史(六):揭开 Tokenizer 的神秘面纱

Tokenizer 是 LLM 中很重要但又没那么 fancy 的组件,在本系列之前的语言模型建模中,tokenizer 的实现方式是字符级的,将所有可能出现的 65 种字符制作嵌入表,然后用 embedding layer 对训练集进行编码向量化。而实践中…
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover

微调之道

选择 LLM 完成一个 NLP 任务,如何下手? 从下图中就能很好的明白哪个操作适合完成你当前的任务: 如果你有时间和大量数据,你完全可以重新训练模型;一定量的数据,可以对预训练模型进行微调;数据不多,最好的选择是 “in context learning”,上下文学习…
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover

LLM演进史(四):WaveNet——序列模型的卷积革新

本节内容的源代码仓库。 我们在前面的部分搭建了一个多层感知机字符级的语言模型,现在是时候把它的结构变的更复杂了。现在的目标是,输入序列能够输入更多字符,而不是现在的 3 个。除此之外,我们不想把它们都放到一个隐藏层中,避免压缩太多信息。这样得到一个类似WaveNet的更深的模型。…
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover

LLM演进史(三):批归一化——激活与梯度的统计调和

本节的重点在于,要对于训练过程中神经网络的激活,特别是向下流动的梯度有深刻的印象和理解。理解这些结构的发展历史是很重要的,因为 RNN (循环神经网络),作为一个通用逼近器 (universal approximator),它原则上可以实现所有的算法…
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover

GPT的现状

本文是对 Andrej Karpathy 的在 2023 年 3 月份的 Microsoft Build 演讲的整理。 演讲 Beamer 可见于:https://karpathy.ai/stateofgpt.pdf 演讲介绍了 GPT 的训练过程,发展地步,当前的 LLM…
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover

LLM演进史(二):词嵌入——多层感知器与语言的深层连接

本节的源代码仓库地址 本文算是训练语言模型的经典之作,Bengio 将神经网络引入语言模型的训练中,并得到了词嵌入这个副产物。词嵌入对后面深度学习在自然语言处理方面有很大的贡献,也是获取词的语义特征的有效方法。 论文的提出源于解决原词向量(one-hot 表示…
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover

LLM演进史(一):Bigram的简洁之道

本节的源代码仓库地址 前面我们通过实现micrograd,弄明白了梯度的意义和如何优化。现在我们可以进入到语言模型的学习阶段,了解初级阶段的语言模型是如何设计、建模的。 Bigram (一个字符通过一个计数的查找表来预测下一个字符。) MLP, 根据 Bengio et al…
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover
cover

从0实现一个极简的自动微分框架

代码仓库:https://github.com/karpathy/nn-zero-to-hero Andrej Karpathy 是著名深度学习课程 Stanford CS 231n 的作者与主讲师,也是 OpenAI 创始人之一,"micrograd" 是他创建的一个小型…
Ownership of this blog data is guaranteed by blockchain and smart contracts to the creator alone.