Deprecated : The each() function is deprecated. This message will be suppressed on further calls in /home/zhenxiangba/zhenxiangba.com/public_html/phproxy-improved-master/index.php on line 456
RegularizedSelfPlay (Regularized Self-Play)
RegularizedSelfPlay/sppo_reversekl-0.1-Gemma-2-2B-IT-RSPO-Iter3 Text Generation
• 3B • Updated Aug 11, 2025 • 2
RegularizedSelfPlay/sppo_reversekl-0.1-Gemma-2-2B-IT-RSPO-Iter2 Text Generation
• 3B • Updated Aug 11, 2025 • 2
RegularizedSelfPlay/sppo_reversekl-0.1-Gemma-2-2B-IT-RSPO-Iter1 Text Generation
• 3B • Updated Aug 11, 2025 • 1
RegularizedSelfPlay/Gemma-2-2B-SPPO-It-Iter1 Text Generation
• 3B • Updated Aug 11, 2025 • 2
RegularizedSelfPlay/Llama-3-8B-Instruct-SPPO-Iter2-gp-8b-gpm-reg0.5-sppo-reversekl-table Text Generation
• 8B • Updated Jul 30, 2025 • 3
RegularizedSelfPlay/Llama-3-8B-Instruct-SPPO-Iter2-gp-8b-gpm-reg0.05-sppo-reversekl-table Text Generation
• 8B • Updated Jul 30, 2025 • 5
RegularizedSelfPlay/Llama-3-8B-Instruct-SPPO-Iter1-gp-8b-gpm-reg0.05-sppo-reversekl-table Text Generation
• 8B • Updated Jul 30, 2025 • 3
RegularizedSelfPlay/Llama-3-8B-Instruct-SPPO-Iter2-gp-8b-gpm-reg0.1-sppo-forwardimportance10-table Text Generation
• 8B • Updated Jul 30, 2025 • 4
RegularizedSelfPlay/Llama-3-8B-Instruct-SPPO-Iter3-gp-8b-gpm-reg0.5-sppo-reversekl-table Text Generation
• 8B • Updated Jul 30, 2025 • 5
RegularizedSelfPlay/Llama-3-8B-Instruct-SPPO-Iter1-gp-8b-gpm-reg0.5-sppo-reversekl-table Text Generation
• 8B • Updated Jul 30, 2025 • 3
RegularizedSelfPlay/Llama-3-8B-Instruct-SPPO-Iter3-gp-8b-gpm-table Text Generation
• 8B • Updated Jul 30, 2025 • 6
RegularizedSelfPlay/Llama-3-8B-Instruct-SPPO-Iter2-gp-8b-gpm-table Text Generation
• 8B • Updated Jul 30, 2025 • 2
RegularizedSelfPlay/Llama-3-8B-Instruct-SPPO-Iter3-gp-8b-gpm-reg0.1-sppo-forwardimportance10-table Text Generation
• 8B • Updated Jul 30, 2025 • 6
RegularizedSelfPlay/Llama-3-8B-Instruct-SPPO-Iter3-gp-8b-gpm-reg0.05-sppo-reversekl-table Text Generation
• 8B • Updated Jul 30, 2025 • 5
RegularizedSelfPlay/Llama-3-8B-Instruct-SPPO-Iter1-gp-8b-gpm-reg0.1-sppo-forwardimportance10-table Text Generation
• 8B • Updated Jul 30, 2025 • 4
RegularizedSelfPlay/Llama-3-8B-Instruct-SPPO-Iter1-gp-8b-gpm-table Text Generation
• 8B • Updated Jul 30, 2025 • 5
RegularizedSelfPlay/Llama-3-8B-Instruct-GPM-8B-SPPO-Iter1 Updated Jul 29, 2025
RegularizedSelfPlay/Mistral-7B-Instruct-GPM-8B-SPPO-Iter1 Text Generation
• 8B • Updated Jul 29, 2025 • 6
RegularizedSelfPlay/Mistral-7B-Instruct-GPM-SPPO-Iter2 Text Generation
• 7B • Updated Jul 28, 2025 • 4
RegularizedSelfPlay/Mistral-7B-Instruct-GPM-SPPO-Iter1 Text Generation
• 7B • Updated Jul 28, 2025 • 4
RegularizedSelfPlay/Llama-3-8B-Instruct-SPPO-Iter3 8B • Updated Mar 29, 2025 • 5
RegularizedSelfPlay/Llama-3-8B-Instruct-SPPO-Iter2 8B • Updated Mar 29, 2025 • 4
RegularizedSelfPlay/Llama-3-8B-Instruct-SPPO-Iter1 8B • Updated Mar 29, 2025 • 1
RegularizedSelfPlay/sppo_reversekl-0.5-Llama-3-8B-Instruct-RSPO-Iter3 8B • Updated Mar 29, 2025 • 5
RegularizedSelfPlay/sppo_reversekl-0.5-Llama-3-8B-Instruct-RSPO-Iter2 8B • Updated Mar 29, 2025 • 4
RegularizedSelfPlay/sppo_reversekl-0.5-Llama-3-8B-Instruct-RSPO-Iter1 8B • Updated Mar 29, 2025 • 1
RegularizedSelfPlay/sppo_forward1reverse5-0.1-Llama-3-8B-Instruct-RSPO-Iter1 Updated Mar 28, 2025
RegularizedSelfPlay/sppo_reverseklnoent-0.5-PromptABC-Mistral-7B-Instruct-SPPO-Iter2 7B • Updated Mar 27, 2025 • 1
RegularizedSelfPlay/sppo_reverseklnoent-0.5-PromptABC-Mistral-7B-Instruct-SPPO-Iter3 7B • Updated Mar 27, 2025 • 3
RegularizedSelfPlay/sppo_reverseklnoent-0.5-PromptABC-Mistral-7B-Instruct-SPPO-Iter1 7B • Updated Mar 27, 2025 • 1