Menu
Avatar
The menu of my blog
Quick Stats
Quests
31 Quests
Messages
2 Messages
Playback
5 Playback
Items
14 Items
Skills
2 Skills
Trace
1 Trace
Message

The Sword Art Online Utilities Project

Welcome, traveler. This is a personal blog built in the style of the legendary SAO game interface. Navigate through the menu to explore the journal, skills, and item logs.

© 2020-2026 Nagi-ovo | RSS | Breezing
Quests

#RLHF

1 post

From RL to RLHF

From RL to RLHF

May 8, 2025, 02:15 PM 50 min read

This article is primarily based on Umar Jamil's course for learning and recording purposes. Our goal is to align LLM behavior with our desired outputs, and RLHF is one of the most famous techniques for this.

Deep LearningRLHFLLM
Session 00:00:00