Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: \n\n@librarian-bot\n\t recommend

\n","updatedAt":"2026-02-19T01:40:28.779Z","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":317,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.7204253077507019},"editors":["librarian-bot"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2602.15327","authors":[{"_id":"699550588d17d1ee8c10ecd7","user":{"_id":"624054bcc2c17da6a63eb539","avatarUrl":"/avatars/bf52dc0683b4100733f8696a97696d0e.svg","isPro":true,"fullname":"hlzhang109","user":"hlzhang109","type":"user"},"name":"Hanlin Zhang","status":"admin_assigned","statusLastChangedAt":"2026-02-18T13:45:20.828Z","hidden":false},{"_id":"699550588d17d1ee8c10ecd8","name":"Jikai Jin","hidden":false},{"_id":"699550588d17d1ee8c10ecd9","user":{"_id":"690526084ae4c0fc109fbaef","avatarUrl":"/avatars/87e1257024d79c4469dad48a8bb3bdab.svg","isPro":false,"fullname":"Vasilis Syrgkanis","user":"vsyrgk","type":"user"},"name":"Vasilis Syrgkanis","status":"admin_assigned","statusLastChangedAt":"2026-02-18T13:45:26.632Z","hidden":false},{"_id":"699550588d17d1ee8c10ecda","name":"Sham Kakade","hidden":false}],"publishedAt":"2026-02-17T03:13:51.000Z","submittedOnDailyAt":"2026-02-18T03:13:27.451Z","title":"Prescriptive Scaling Reveals the Evolution of Language Model Capabilities","submittedOnDailyBy":{"_id":"624054bcc2c17da6a63eb539","avatarUrl":"/avatars/bf52dc0683b4100733f8696a97696d0e.svg","isPro":true,"fullname":"hlzhang109","user":"hlzhang109","type":"user"},"summary":"For deploying foundation models, practitioners increasingly need prescriptive scaling laws: given a pre training compute budget, what downstream accuracy is attainable with contemporary post training practice, and how stable is that mapping as the field evolves? Using large scale observational evaluations with 5k observational and 2k newly sampled data on model performance, we estimate capability boundaries, high conditional quantiles of benchmark scores as a function of log pre training FLOPs, via smoothed quantile regression with a monotone, saturating sigmoid parameterization. We validate the temporal reliability by fitting on earlier model generations and evaluating on later releases. Across various tasks, the estimated boundaries are mostly stable, with the exception of math reasoning that exhibits a consistently advancing boundary over time. We then extend our approach to analyze task dependent saturation and to probe contamination related shifts on math reasoning tasks. Finally, we introduce an efficient algorithm that recovers near full data frontiers using roughly 20% of evaluation budget. Together, our work releases the Proteus 2k, the latest model performance evaluation dataset, and introduces a practical methodology for translating compute budgets into reliable performance expectations and for monitoring when capability boundaries shift across time.","upvotes":2,"discussionId":"699550588d17d1ee8c10ecdb","projectPage":"https://jkjin.com/prescriptive-scaling","ai_summary":"Large-scale observational analysis estimates capability boundaries and performance predictions for foundation models using quantile regression and evaluates temporal stability across tasks.","ai_keywords":["scaling laws","quantile regression","sigmoid parameterization","conditional quantiles","observational evaluations","model performance","compute budget","capability boundaries","temporal reliability","smoothing"]},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"620783f24e28382272337ba4","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/620783f24e28382272337ba4/zkUveQPNiDfYjgGhuFErj.jpeg","isPro":false,"fullname":"GuoLiangTang","user":"Tommy930","type":"user"},{"_id":"684d57f26e04c265777ead3f","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/cuOj-bQqukSZreXgUJlfm.png","isPro":false,"fullname":"Joakim Lee","user":"Reinforcement4All","type":"user"}],"acceptLanguages":["*"],"dailyPaperRank":0}">

Papers

arxiv:2602.15327

Prescriptive Scaling Reveals the Evolution of Language Model Capabilities

Published on Feb 17

· Submitted by

hlzhang109 on Feb 18

Upvote

Authors:

Hanlin Zhang ,

Vasilis Syrgkanis ,

Abstract

Large-scale observational analysis estimates capability boundaries and performance predictions for foundation models using quantile regression and evaluates temporal stability across tasks.

AI-generated summary

For deploying foundation models, practitioners increasingly need prescriptive scaling laws: given a pre training compute budget, what downstream accuracy is attainable with contemporary post training practice, and how stable is that mapping as the field evolves? Using large scale observational evaluations with 5k observational and 2k newly sampled data on model performance, we estimate capability boundaries, high conditional quantiles of benchmark scores as a function of log pre training FLOPs, via smoothed quantile regression with a monotone, saturating sigmoid parameterization. We validate the temporal reliability by fitting on earlier model generations and evaluating on later releases. Across various tasks, the estimated boundaries are mostly stable, with the exception of math reasoning that exhibits a consistently advancing boundary over time. We then extend our approach to analyze task dependent saturation and to probe contamination related shifts on math reasoning tasks. Finally, we introduce an efficient algorithm that recovers near full data frontiers using roughly 20% of evaluation budget. Together, our work releases the Proteus 2k, the latest model performance evaluation dataset, and introduces a practical methodology for translating compute budgets into reliable performance expectations and for monitoring when capability boundaries shift across time.

View arXiv page View PDF Project page Add to collection

Community

hlzhang109

Paper author Paper submitter about 23 hours ago

We introduce prescriptive scaling, a framework for predicting the attainable downstream performance of language models given a fixed pre-training compute budget. Rather than modeling average trends, we estimate high-quantile capability boundaries using monotone sigmoid quantile regression and show that post-training performance is largely predictable and stable over time for most tasks. We find that math reasoning is a notable exception, with a boundary that continues to advance across model generations. We also release the PROTEUS-2K dataset and propose an efficient sampling method that recovers near-full performance frontiers with a fraction of the evaluation cost.

librarian-bot

about 3 hours ago

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2602.15327 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2602.15327 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2602.15327 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.