Nagi-ovo
Follow
Breezing homepage: [nagi.fun](nagi.fun)
146
Followers
17
Followings
Patron
主页
Archives
Portfolios
日寄
PPO
Latest
Hottest
Most Commented
“速通” PPO
Proximal Policy Optimization 终于到了这几年 NLP 领域中比较火热的 RL 算法之一了 On-Policy 算法中,采集数据用的策略和训练的策略是相同的,这样的问题是数据用一次后就得丢弃,然后再重新采集数据,训练速度很慢。 PPO 背后的直觉 …
RL
4 min
2 months ago
Ownership of this blog data is guaranteed by blockchain and smart contracts to the creator alone.
Blockchain ID
#61009
Owner
0x6380302480224d53ec4c2c318d1c7be2c55a7582
Transaction Hash
Creation 0xe99aa0c3...eadd5b56dd
Last Update 0x183e93de...ca7c86d2ef
IPFS Address
ipfs://bafkreid6e67datddz2iqadcmnro7m7dkhq3ocwp24mfkybtupiznpzoy7y