HF Daily - a PeterLee6094 Collection

Deprecated: The each() function is deprecated. This message will be suppressed on further calls in /home/zhenxiangba/zhenxiangba.com/public_html/phproxy-improved-master/index.php on line 456
HF Daily - a PeterLee6094 Collection

PeterLee6094 's Collections

HF Daily

updated Aug 5, 2025

Large Language Diffusion Models

Paper • 2502.09992 • Published Feb 14, 2025 • 127
MM-RLHF: The Next Step Forward in Multimodal LLM Alignment

Paper • 2502.10391 • Published Feb 14, 2025 • 34
Diverse Inference and Verification for Advanced Reasoning

Paper • 2502.09955 • Published Feb 14, 2025 • 18
Selective Self-to-Supervised Fine-Tuning for Generalization in Large Language Models

Paper • 2502.08130 • Published Feb 12, 2025 • 9
Jailbreaking to Jailbreak

Paper • 2502.09638 • Published Feb 9, 2025 • 6
Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention

Paper • 2502.11089 • Published Feb 16, 2025 • 170
ReLearn: Unlearning via Learning for Large Language Models

Paper • 2502.11190 • Published Feb 16, 2025 • 30
How Do LLMs Acquire New Knowledge? A Knowledge Circuits Perspective on Continual Pre-Training

Paper • 2502.11196 • Published Feb 16, 2025 • 23
CRANE: Reasoning with constrained LLM generation

Paper • 2502.09061 • Published Feb 13, 2025 • 21
One Example Shown, Many Concepts Known! Counterexample-Driven Conceptual Reasoning in Mathematical LLMs

Paper • 2502.10454 • Published Feb 12, 2025 • 7
Dyve: Thinking Fast and Slow for Dynamic Process Verification

Paper • 2502.11157 • Published Feb 16, 2025 • 7
Show Me the Work: Fact-Checkers' Requirements for Explainable Automated Fact-Checking

Paper • 2502.09083 • Published Feb 13, 2025 • 4
Continuous Diffusion Model for Language Modeling

Paper • 2502.11564 • Published Feb 17, 2025 • 53
Rethinking Diverse Human Preference Learning through Principal Component Analysis

Paper • 2502.13131 • Published Feb 18, 2025 • 37
SafeRoute: Adaptive Model Selection for Efficient and Accurate Safety Guardrails in Large Language Models

Paper • 2502.12464 • Published Feb 18, 2025 • 28
Revisiting the Test-Time Scaling of o1-like Models: Do they Truly Possess Test-Time Scaling Capabilities?

Paper • 2502.12215 • Published Feb 17, 2025 • 16
HeadInfer: Memory-Efficient LLM Inference by Head-wise Offloading

Paper • 2502.12574 • Published Feb 18, 2025 • 13
The Hidden Risks of Large Reasoning Models: A Safety Assessment of R1

Paper • 2502.12659 • Published Feb 18, 2025 • 7
Injecting Domain-Specific Knowledge into Large Language Models: A Comprehensive Survey

Paper • 2502.10708 • Published Feb 15, 2025 • 4
Qwen2.5-VL Technical Report

Paper • 2502.13923 • Published Feb 19, 2025 • 218
On the Trustworthiness of Generative Foundation Models: Guideline, Assessment, and Perspective

Paper • 2502.14296 • Published Feb 20, 2025 • 45
Small Models Struggle to Learn from Strong Reasoners

Paper • 2502.12143 • Published Feb 17, 2025 • 39
LongPO: Long Context Self-Evolution of Large Language Models through Short-to-Long Preference Optimization

Paper • 2502.13922 • Published Feb 19, 2025 • 27
MLGym: A New Framework and Benchmark for Advancing AI Research Agents

Paper • 2502.14499 • Published Feb 20, 2025 • 195
From RAG to Memory: Non-Parametric Continual Learning for Large Language Models

Paper • 2502.14802 • Published Feb 20, 2025 • 13
RLPR: Extrapolating RLVR to General Domains without Verifiers

Paper • 2506.18254 • Published Jun 23, 2025 • 33
RePIC: Reinforced Post-Training for Personalizing Multi-Modal Language Models

Paper • 2506.18369 • Published Jun 23, 2025 • 2
LongWriter-Zero: Mastering Ultra-Long Text Generation via Reinforcement Learning

Paper • 2506.18841 • Published Jun 23, 2025 • 56
Phantom-Data : Towards a General Subject-Consistent Video Generation Dataset

Paper • 2506.18851 • Published Jun 23, 2025 • 30
ReasonFlux-PRM: Trajectory-Aware PRMs for Long Chain-of-Thought Reasoning in LLMs

Paper • 2506.18896 • Published Jun 23, 2025 • 29
Robust Reward Modeling via Causal Rubrics

Paper • 2506.16507 • Published Jun 19, 2025 • 9
SRFT: A Single-Stage Method with Supervised and Reinforcement Fine-Tuning for Reasoning

Paper • 2506.19767 • Published Jun 24, 2025 • 15
OctoThinker: Mid-training Incentivizes Reinforcement Learning Scaling

Paper • 2506.20512 • Published Jun 25, 2025 • 47
ReCode: Updating Code API Knowledge with Reinforcement Learning

Paper • 2506.20495 • Published Jun 25, 2025 • 10
MMSearch-R1: Incentivizing LMMs to Search

Paper • 2506.20670 • Published Jun 25, 2025 • 64
Mind2Web 2: Evaluating Agentic Search with Agent-as-a-Judge

Paper • 2506.21506 • Published Jun 26, 2025 • 52
Deep Researcher with Test-Time Diffusion

Paper • 2507.16075 • Published Jul 21, 2025 • 68
MMBench-GUI: Hierarchical Multi-Platform Evaluation Framework for GUI Agents

Paper • 2507.19478 • Published Jul 25, 2025 • 33
GEPA: Reflective Prompt Evolution Can Outperform Reinforcement Learning

Paper • 2507.19457 • Published Jul 25, 2025 • 31
Agentic Reinforced Policy Optimization

Paper • 2507.19849 • Published Jul 26, 2025 • 161
A Survey of Self-Evolving Agents: On Path to Artificial Super Intelligence

Paper • 2507.21046 • Published Jul 28, 2025 • 85
Geometric-Mean Policy Optimization

Paper • 2507.20673 • Published Jul 28, 2025 • 32
Goal Alignment in LLM-Based User Simulators for Conversational AI

Paper • 2507.20152 • Published Jul 27, 2025 • 5
Beyond Binary Rewards: Training LMs to Reason About Their Uncertainty

Paper • 2507.16806 • Published Jul 22, 2025 • 7
MaPPO: Maximum a Posteriori Preference Optimization with Prior Knowledge

Paper • 2507.21183 • Published Jul 27, 2025 • 15
Persona Vectors: Monitoring and Controlling Character Traits in Language Models

Paper • 2507.21509 • Published Jul 29, 2025 • 33
VL-Cogito: Progressive Curriculum Reinforcement Learning for Advanced Multimodal Reasoning

Paper • 2507.22607 • Published Jul 30, 2025 • 47
MixGRPO: Unlocking Flow-based GRPO Efficiency with Mixed ODE-SDE

Paper • 2507.21802 • Published Jul 29, 2025 • 19