Menu
Avatar
The menu of my blog
Quick Stats
Quests
31 Quests
Messages
2 Messages
Playback
5 Playback
Items
14 Items
Skills
2 Skills
Trace
1 Trace
Message

The Sword Art Online Utilities Project

Welcome, traveler. This is a personal blog built in the style of the legendary SAO game interface. Navigate through the menu to explore the journal, skills, and item logs.

© 2020-2026 Nagi-ovo | RSS | Breezing
Quests

#RLHF

1 post

从 RL 来,到 RLHF 去

从 RL 来,到 RLHF 去

2025年5月8日 14:15 50 min read

本文主要基于 Umar Jamil 的课程进行学习和记录。我们的目标是让 LLM 的行为与我们的期望的输出相一致,RLHF 则是最著名的技术之一。

深度学习RLHFLLM
Session 00:00:00