Deprecated: The each() function is deprecated. This message will be suppressed on further calls in /home/zhenxiangba/zhenxiangba.com/public_html/phproxy-improved-master/index.php on line 456 Paper page - Scaling Laws for Robust Comparison of Open Foundation Language-Vision
Models and Datasets
https://openreview.net/forum?id=cWnZLIdeKn\n","updatedAt":"2026-02-04T22:44:41.966Z","author":{"_id":"6355b485b8b79340d4630dd5","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6355b485b8b79340d4630dd5/HIZO4ybweRy48VdCtk2MB.jpeg","fullname":"Jenia Jitsev","name":"JJitsev","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":17,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.4370889365673065},"editors":["JJitsev"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/6355b485b8b79340d4630dd5/HIZO4ybweRy48VdCtk2MB.jpeg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2506.04598","authors":[{"_id":"68423f5f0a25ba60aa257550","user":{"_id":"62500f361684e0335e527bc6","avatarUrl":"/avatars/8d2bc4c9cfa8ea82049196431cc3ebea.svg","isPro":false,"fullname":"Marianna Nezhurina","user":"marianna13","type":"user"},"name":"Marianna Nezhurina","status":"claimed_verified","statusLastChangedAt":"2025-06-07T05:49:11.910Z","hidden":false},{"_id":"68423f5f0a25ba60aa257551","name":"Tomer Porian","hidden":false},{"_id":"68423f5f0a25ba60aa257552","name":"Giovanni Pucceti","hidden":false},{"_id":"68423f5f0a25ba60aa257553","name":"Tommie Kerssies","hidden":false},{"_id":"68423f5f0a25ba60aa257554","name":"Romain Beaumont","hidden":false},{"_id":"68423f5f0a25ba60aa257555","user":{"_id":"61af95fec9ba387cb63f4537","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/61af95fec9ba387cb63f4537/dRl6Orv_MxoUzocEzX2Lc.png","isPro":false,"fullname":"Mehdi Cherti","user":"mehdidc","type":"user"},"name":"Mehdi Cherti","status":"claimed_verified","statusLastChangedAt":"2025-06-09T10:12:34.895Z","hidden":false},{"_id":"68423f5f0a25ba60aa257556","user":{"_id":"6355b485b8b79340d4630dd5","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6355b485b8b79340d4630dd5/HIZO4ybweRy48VdCtk2MB.jpeg","isPro":false,"fullname":"Jenia Jitsev","user":"JJitsev","type":"user"},"name":"Jenia Jitsev","status":"claimed_verified","statusLastChangedAt":"2025-06-16T07:17:49.036Z","hidden":false}],"publishedAt":"2025-06-05T03:35:59.000Z","submittedOnDailyAt":"2025-06-06T10:38:05.994Z","title":"Scaling Laws for Robust Comparison of Open Foundation Language-Vision\n Models and Datasets","submittedOnDailyBy":{"_id":"6355b485b8b79340d4630dd5","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6355b485b8b79340d4630dd5/HIZO4ybweRy48VdCtk2MB.jpeg","isPro":false,"fullname":"Jenia Jitsev","user":"JJitsev","type":"user"},"summary":"In studies of transferable learning, scaling laws are obtained for various\nimportant foundation models to predict their properties and performance at\nlarger scales. We show here how scaling law derivation can also be used for\nmodel and dataset comparison, allowing to decide which procedure is to be\npreferred for pre-training. For the first time, full scaling laws based on\ndense measurements across a wide span of model and samples seen scales are\nderived for two important language-vision learning procedures, CLIP and MaMMUT,\nthat use either contrastive only or contrastive and captioning text generative\nloss. Ensuring sufficient prediction accuracy for held out points, we use\nderived scaling laws to compare both models, obtaining evidence for MaMMUT's\nstronger improvement with scale and better sample efficiency than standard\nCLIP. To strengthen validity of the comparison, we show scaling laws for\nvarious downstream tasks, classification, retrieval, and segmentation, and for\ndifferent open datasets, DataComp, DFN and Re-LAION, observing consistently the\nsame trends. We show that comparison can also be performed when deriving\nscaling laws with a constant learning rate schedule, reducing compute cost.\nAccurate derivation of scaling laws provides thus means to perform model and\ndataset comparison across scale spans, avoiding misleading conclusions based on\nmeasurements from single reference scales only, paving the road for systematic\ncomparison and improvement of open foundation models and datasets for their\ncreation. We release all the pre-trained models with their intermediate\ncheckpoints, including openMaMMUT-L/14, which achieves 80.3% zero-shot\nImageNet-1k accuracy, trained on 12.8B samples from DataComp-1.4B. Code for\nreproducing experiments in the paper and raw experiments data can be found at\nhttps://github.com/LAION-AI/scaling-laws-for-comparison.","upvotes":7,"discussionId":"68423f600a25ba60aa257595","projectPage":"https://github.com/LAION-AI/scaling-laws-for-comparison","githubRepo":"https://github.com/laion-ai/scaling-laws-for-comparison","githubRepoAddedBy":"auto","ai_summary":"Scaling laws are derived for CLIP and MaMMUT to compare their performance and sample efficiency across different scales and datasets.","ai_keywords":["scaling laws","transferable learning","CLIP","MaMMUT","contrastive loss","captioning text generative loss","model comparison","dataset comparison","zero-shot accuracy","DataComp","DFN","Re-LAION","learning rate schedule"],"githubStars":17},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"62500f361684e0335e527bc6","avatarUrl":"/avatars/8d2bc4c9cfa8ea82049196431cc3ebea.svg","isPro":false,"fullname":"Marianna Nezhurina","user":"marianna13","type":"user"},{"_id":"61af95fec9ba387cb63f4537","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/61af95fec9ba387cb63f4537/dRl6Orv_MxoUzocEzX2Lc.png","isPro":false,"fullname":"Mehdi Cherti","user":"mehdidc","type":"user"},{"_id":"6355b485b8b79340d4630dd5","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6355b485b8b79340d4630dd5/HIZO4ybweRy48VdCtk2MB.jpeg","isPro":false,"fullname":"Jenia Jitsev","user":"JJitsev","type":"user"},{"_id":"6684ff5d89a54000c958bd36","avatarUrl":"/avatars/3202d98b49c6c234f9d8bf4aed5362d3.svg","isPro":false,"fullname":"Tomer Porian","user":"TomerPorian","type":"user"},{"_id":"620783f24e28382272337ba4","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/620783f24e28382272337ba4/zkUveQPNiDfYjgGhuFErj.jpeg","isPro":false,"fullname":"GuoLiangTang","user":"Tommy930","type":"user"},{"_id":"5fc6879e1c5ee87b1164876d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/5fc6879e1c5ee87b1164876d/Tjnm_lv0Bq0gPbFOTDH6E.jpeg","isPro":false,"fullname":"Huu Nguyen","user":"huu-ontocord","type":"user"},{"_id":"67406965d4b6b78c917e4634","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/67406965d4b6b78c917e4634/rc2u718EjU_5CiSwICN6J.jpeg","isPro":false,"fullname":"seohyun","user":"happy8825","type":"user"}],"acceptLanguages":["*"],"dailyPaperRank":0}">
Scaling laws are derived for CLIP and MaMMUT to compare their performance and sample efficiency across different scales and datasets.
AI-generated summary
In studies of transferable learning, scaling laws are obtained for various
important foundation models to predict their properties and performance at
larger scales. We show here how scaling law derivation can also be used for
model and dataset comparison, allowing to decide which procedure is to be
preferred for pre-training. For the first time, full scaling laws based on
dense measurements across a wide span of model and samples seen scales are
derived for two important language-vision learning procedures, CLIP and MaMMUT,
that use either contrastive only or contrastive and captioning text generative
loss. Ensuring sufficient prediction accuracy for held out points, we use
derived scaling laws to compare both models, obtaining evidence for MaMMUT's
stronger improvement with scale and better sample efficiency than standard
CLIP. To strengthen validity of the comparison, we show scaling laws for
various downstream tasks, classification, retrieval, and segmentation, and for
different open datasets, DataComp, DFN and Re-LAION, observing consistently the
same trends. We show that comparison can also be performed when deriving
scaling laws with a constant learning rate schedule, reducing compute cost.
Accurate derivation of scaling laws provides thus means to perform model and
dataset comparison across scale spans, avoiding misleading conclusions based on
measurements from single reference scales only, paving the road for systematic
comparison and improvement of open foundation models and datasets for their
creation. We release all the pre-trained models with their intermediate
checkpoints, including openMaMMUT-L/14, which achieves 80.3% zero-shot
ImageNet-1k accuracy, trained on 12.8B samples from DataComp-1.4B. Code for
reproducing experiments in the paper and raw experiments data can be found at
https://github.com/LAION-AI/scaling-laws-for-comparison.
In this work, scaling laws are used for the first time to robustly compare open foundation language-vision models and datasets, taking important models openCLIP and openMammut, open datasets DataComp, DFN and Re-LAION as examples. Following the scaling law based comparison, openMammut L-14 is trained on 12.8B samples from DataComp-1.4B, achieving 80.34% zero shot ImageNet-1k classification, outperforming or matching as fully open-source model other open-weights models of same compute class.