Models from the paper "LaSeR: Reinforcement Learning with Last-Token Self-Rewarding"
Wenkai Yang
Keven16
AI & ML interests
None yet
Recent Activity
authored
a paper
3 days ago
Learning beyond Teacher: Generalized On-Policy Distillation with Reward Extrapolation
authored
a paper
3 days ago
Learning to Focus: Causal Attention Distillation via Gradient-Guided
Token Pruning
updated
a dataset
3 days ago
Keven16/G-OPD-Training-Data
Organizations
None yet