pico-train: Minimalist training framework for language models. \n
pico-analyze: Tools for measuring and visualizing model learning dynamics across checkpoints.\n\n
This HuggingFace organization hosts our pre-trained models and datasets, while the GitHub repository provides the code to train and analyze your own model suites from scratch.
\n
All code and artifacts are licensed under a permissive Apache-2.0 license.
\n
\nPro Tip 🚀 : \nTo learn more about these libraries and explore detailed tutorials, visit our official website picolm.io and get fully acquainted with the Pico ecosystem.
\n
\n
\n
🤗 HuggingFace Resources (You Are Here)
\n
1. Pre-trained Model Suite
\n
Our complete suite of models from 11M to 570M parameters trained with Pico:
\n
\n
\n🚧 Disclaimer These models are still under construction. The models released in this repository have been trained for 125,000 steps (corresponding to ~250B tokens). Training will finalize after 200,000 steps.
\n🚧 Coming Soon! pico-decoder-xl (1B+ parameters) Watch this space or star our GitHub repository for updates!
\n
\n
All models are on the pretokenized-dolma dataset. They all see the same training data at each training step, use the same optimizatation process, and share the same model architecture; the only difference between models is the size of their hidden dimension.
\n
In each model repository, we version control checkpoints every 1000 steps that contain:
\n
\n- Weights and optimizer states (HuggingFace and Lightning Fabric-compatible versions)
\n- Model activations and gradients
\n- The batch of training data observed at the given training step
\n
\n
We visualize the learning process in our Wandb.
\n
Model Details:
\n
\n\t
\n\t\t\n| Aspect | \nDetails | \n
\n\n\t\t\n| Architecture | \n- Llama-style transformer (decoder-only) - RMSNorm normalization - RoPE (Rotary Positional Embeddings) - Multi-head attention with KV-cache - SwiGLU activation function | \n
\n\n| Sequence Length | \n2048 | \n
\n\n| Batch Size | \n1024 | \n
\n\n| Optimizer | \nAdamW | \n
\n\n| Learning Rate | \n3e-4 (one-cycle warmup) | \n
\n\n| Gradient Clipping | \n1.0 | \n
\n\n| Precision | \nMixed precision training | \n
\n\n| Vocabulary Size | \n50,280 | \n
\n\n\t
\n
\n
2. Datasets
\n
\npretokenized-dolma
\n\n- 420B tokens of pre-processed, tokenized and shuffled text extraced from the DOLMA corpus
\n- We use this dataset to train our model suite
\n
\n \npretokenized-dolma-tinsy
\n\n- A smaller version of the pretokenized-dolma corpus for quick experiments
\n
\n \npretokenized-paloma
\n\n- A tokenized and shuffled version of the Paloma evaluation corpus
\n- The Paloma corpus was carefully curated to be disjoint from the Dolma corpus and provides
\n- We use this corpus to evaluate the perplexity of our models
\n
\n \npretokenized-paloma-tinsy
\n\n- A sub-sampled version of the pretokenized-dolma corpus
\n
\n \n
\n
All datasets are tokenized using the OLMo Tokenizer
\n
\n
🔍 Citation
\n
If you use Pico in academic or professional work, please cite it:
\n
@inproceedings{diehl-martinez-etal-2025-pico,\n title = \"Pico: A Modular Framework for Hypothesis-Driven Small Language Model Research\",\n author = \"Diehl Martinez, Richard and\n Africa, David Demitri and\n Weiss, Yuval and\n Salhan, Suchir and\n Daniels, Ryan and\n Buttery, Paula\",\n editor = {Habernal, Ivan and\n Schulam, Peter and\n Tiedemann, J{\\\"o}rg},\n booktitle = \"Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: System Demonstrations\",\n month = nov,\n year = \"2025\",\n address = \"Suzhou, China\",\n publisher = \"Association for Computational Linguistics\",\n}\n
\n
Thanks for checking out Pico!
Star our GitHub repositories or join our community discussions to stay updated. If you find a bug or have questions, open an issue—contributions are welcome!
\n","classNames":"hf-sanitized hf-sanitized-RoXeAtGQNffBMP4G6blHt"},"users":[{"_id":"638a2e13f32316c0440f5337","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1670000094887-noauth.jpeg","isPro":false,"fullname":"Richard Diehl Martinez","user":"rdiehlmartinez","type":"user"},{"_id":"671fb36b2944e609b720f0bc","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/671fb36b2944e609b720f0bc/l4LXhg5_8jMFMmY0J_3zP.jpeg","isPro":false,"fullname":"David Demitri Africa","user":"davidafrica","type":"user"},{"_id":"6720e4040dfc040cad555864","avatarUrl":"/avatars/ddc9440a1a5097ecce69c87f9d2c91f7.svg","isPro":false,"fullname":"Yuval Weiss","user":"yuvalw","type":"user"},{"_id":"65cb5046632959d663c5e2c8","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/65cb5046632959d663c5e2c8/Db6sRavVXqDOsYAxw-4Mq.jpeg","isPro":false,"fullname":"Suchir Salhan","user":"suchirsalhan","type":"user"}],"userCount":4,"collections":[{"slug":"pico-lm/pico-decoder-model-suite-67d9bf4f27815a6146e4e366","title":"Pico Decoder Model Suite","description":"Pico Decoder models (10M-500M)","gating":false,"lastUpdated":"2025-04-22T15:54:14.482Z","owner":{"_id":"66fac26d24f27d02bd9fc4b0","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/638a2e13f32316c0440f5337/9L6Dkxfh2ssI5nPr1WlHY.png","fullname":"Pico Language Model","name":"pico-lm","type":"org","isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":59,"isUserFollowing":false},"items":[{"_id":"67d9bfa5968288a47886f489","position":0,"type":"model","author":"pico-lm","authorData":{"_id":"66fac26d24f27d02bd9fc4b0","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/638a2e13f32316c0440f5337/9L6Dkxfh2ssI5nPr1WlHY.png","fullname":"Pico Language Model","name":"pico-lm","type":"org","isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":59,"isUserFollowing":false},"downloads":40,"gated":false,"id":"pico-lm/pico-decoder-tiny","availableInferenceProviders":[],"lastModified":"2025-06-23T14:25:29.000Z","likes":3,"pipeline_tag":"text-generation","private":false,"repoType":"model","isLikedByUser":false,"widgetOutputUrls":[],"numParameters":11282784},{"_id":"67d9bfa0426aabf9930e274c","position":1,"type":"model","author":"pico-lm","authorData":{"_id":"66fac26d24f27d02bd9fc4b0","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/638a2e13f32316c0440f5337/9L6Dkxfh2ssI5nPr1WlHY.png","fullname":"Pico Language Model","name":"pico-lm","type":"org","isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":59,"isUserFollowing":false},"downloads":48,"gated":false,"id":"pico-lm/pico-decoder-small","availableInferenceProviders":[],"lastModified":"2025-06-23T14:30:17.000Z","likes":0,"pipeline_tag":"text-generation","private":false,"repoType":"model","isLikedByUser":false,"widgetOutputUrls":[],"numParameters":64595328},{"_id":"67d9bf9a7a087207dfd5f5ab","position":2,"type":"model","author":"pico-lm","authorData":{"_id":"66fac26d24f27d02bd9fc4b0","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/638a2e13f32316c0440f5337/9L6Dkxfh2ssI5nPr1WlHY.png","fullname":"Pico Language Model","name":"pico-lm","type":"org","isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":59,"isUserFollowing":false},"downloads":42,"gated":false,"id":"pico-lm/pico-decoder-medium","availableInferenceProviders":[],"lastModified":"2025-06-23T15:02:24.000Z","likes":0,"pipeline_tag":"text-generation","private":false,"repoType":"model","isLikedByUser":false,"widgetOutputUrls":[],"numParameters":181095168},{"_id":"67d9bfab53bc6894db9604bd","position":3,"type":"model","author":"pico-lm","authorData":{"_id":"66fac26d24f27d02bd9fc4b0","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/638a2e13f32316c0440f5337/9L6Dkxfh2ssI5nPr1WlHY.png","fullname":"Pico Language Model","name":"pico-lm","type":"org","isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":59,"isUserFollowing":false},"downloads":94,"gated":false,"id":"pico-lm/pico-decoder-large","availableInferenceProviders":[],"lastModified":"2025-06-23T17:49:10.000Z","likes":0,"pipeline_tag":"text-generation","private":false,"repoType":"model","isLikedByUser":false,"widgetOutputUrls":[],"numParameters":569808384}],"position":1,"theme":"blue","private":false,"shareUrl":"https://hf.co/collections/pico-lm/pico-decoder-model-suite","upvotes":1,"isUpvotedByUser":false},{"slug":"pico-lm/evaluation-data-67a2341b13af573d67b71833","title":"Evaluation Data","description":"","gating":false,"lastUpdated":"2025-02-04T15:37:17.598Z","owner":{"_id":"66fac26d24f27d02bd9fc4b0","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/638a2e13f32316c0440f5337/9L6Dkxfh2ssI5nPr1WlHY.png","fullname":"Pico Language Model","name":"pico-lm","type":"org","isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":59,"isUserFollowing":false},"items":[{"_id":"67a23426cfbd70620c62ec11","position":0,"type":"dataset","author":"pico-lm","downloads":4,"gated":false,"id":"pico-lm/pretokenized-paloma","lastModified":"2025-04-16T10:52:18.000Z","datasetsServerInfo":{"viewer":"viewer","numRows":29016,"libraries":["datasets","pandas","mlcroissant","polars"],"formats":["parquet"],"modalities":["text"]},"private":false,"repoType":"dataset","likes":0,"isLikedByUser":false,"isBenchmark":false},{"_id":"67a2342d583cabcbf8cf6b2a","position":1,"type":"dataset","author":"pico-lm","downloads":1460,"gated":false,"id":"pico-lm/pretokenized-paloma-tinsy","lastModified":"2025-04-16T10:50:54.000Z","datasetsServerInfo":{"viewer":"viewer","numRows":1435,"libraries":["datasets","pandas","mlcroissant","polars"],"formats":["parquet"],"modalities":["text"]},"private":false,"repoType":"dataset","likes":0,"isLikedByUser":false,"isBenchmark":false}],"position":2,"theme":"blue","private":false,"shareUrl":"https://hf.co/collections/pico-lm/evaluation-data","upvotes":0,"isUpvotedByUser":false},{"slug":"pico-lm/training-data-67a0f33e12f3b77287eb7de4","title":"Training Data","description":"","gating":false,"lastUpdated":"2025-02-04T15:36:05.567Z","owner":{"_id":"66fac26d24f27d02bd9fc4b0","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/638a2e13f32316c0440f5337/9L6Dkxfh2ssI5nPr1WlHY.png","fullname":"Pico Language Model","name":"pico-lm","type":"org","isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":59,"isUserFollowing":false},"items":[{"_id":"67a10f734ec16be0a9b1e7a7","position":1,"type":"dataset","author":"pico-lm","downloads":13,"gated":false,"id":"pico-lm/pretokenized-dolma-tinsy","lastModified":"2025-04-16T10:45:46.000Z","datasetsServerInfo":{"viewer":"viewer","numRows":7020,"libraries":["datasets","dask","mlcroissant","polars"],"formats":["parquet"],"modalities":[]},"private":false,"repoType":"dataset","likes":0,"isLikedByUser":false,"isBenchmark":false},{"_id":"67a233e5dc58ef0f94fc4dc9","position":2,"type":"dataset","author":"pico-lm","downloads":583,"gated":false,"id":"pico-lm/pretokenized-dolma","lastModified":"2025-04-16T10:43:37.000Z","datasetsServerInfo":{"viewer":"viewer","numRows":204800000,"libraries":["datasets","dask","mlcroissant","polars"],"formats":["parquet"],"modalities":[]},"private":false,"repoType":"dataset","likes":4,"isLikedByUser":false,"isBenchmark":false}],"position":3,"theme":"green","private":false,"shareUrl":"https://hf.co/collections/pico-lm/training-data","upvotes":0,"isUpvotedByUser":false}],"datasets":[{"author":"pico-lm","downloads":4,"gated":false,"id":"pico-lm/pretokenized-paloma","lastModified":"2025-04-16T10:52:18.000Z","datasetsServerInfo":{"viewer":"viewer","numRows":29016,"libraries":["datasets","pandas","mlcroissant","polars"],"formats":["parquet"],"modalities":["text"]},"private":false,"repoType":"dataset","likes":0,"isLikedByUser":false,"isBenchmark":false},{"author":"pico-lm","downloads":1460,"gated":false,"id":"pico-lm/pretokenized-paloma-tinsy","lastModified":"2025-04-16T10:50:54.000Z","datasetsServerInfo":{"viewer":"viewer","numRows":1435,"libraries":["datasets","pandas","mlcroissant","polars"],"formats":["parquet"],"modalities":["text"]},"private":false,"repoType":"dataset","likes":0,"isLikedByUser":false,"isBenchmark":false},{"author":"pico-lm","downloads":13,"gated":false,"id":"pico-lm/pretokenized-dolma-tinsy","lastModified":"2025-04-16T10:45:46.000Z","datasetsServerInfo":{"viewer":"viewer","numRows":7020,"libraries":["datasets","dask","mlcroissant","polars"],"formats":["parquet"],"modalities":[]},"private":false,"repoType":"dataset","likes":0,"isLikedByUser":false,"isBenchmark":false},{"author":"pico-lm","downloads":583,"gated":false,"id":"pico-lm/pretokenized-dolma","lastModified":"2025-04-16T10:43:37.000Z","datasetsServerInfo":{"viewer":"viewer","numRows":204800000,"libraries":["datasets","dask","mlcroissant","polars"],"formats":["parquet"],"modalities":[]},"private":false,"repoType":"dataset","likes":4,"isLikedByUser":false,"isBenchmark":false}],"models":[{"author":"pico-lm","authorData":{"_id":"66fac26d24f27d02bd9fc4b0","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/638a2e13f32316c0440f5337/9L6Dkxfh2ssI5nPr1WlHY.png","fullname":"Pico Language Model","name":"pico-lm","type":"org","isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":59,"isUserFollowing":false},"downloads":94,"gated":false,"id":"pico-lm/pico-decoder-large","availableInferenceProviders":[],"lastModified":"2025-06-23T17:49:10.000Z","likes":0,"pipeline_tag":"text-generation","private":false,"repoType":"model","isLikedByUser":false,"widgetOutputUrls":[],"numParameters":569808384},{"author":"pico-lm","authorData":{"_id":"66fac26d24f27d02bd9fc4b0","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/638a2e13f32316c0440f5337/9L6Dkxfh2ssI5nPr1WlHY.png","fullname":"Pico Language Model","name":"pico-lm","type":"org","isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":59,"isUserFollowing":false},"downloads":42,"gated":false,"id":"pico-lm/pico-decoder-medium","availableInferenceProviders":[],"lastModified":"2025-06-23T15:02:24.000Z","likes":0,"pipeline_tag":"text-generation","private":false,"repoType":"model","isLikedByUser":false,"widgetOutputUrls":[],"numParameters":181095168},{"author":"pico-lm","authorData":{"_id":"66fac26d24f27d02bd9fc4b0","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/638a2e13f32316c0440f5337/9L6Dkxfh2ssI5nPr1WlHY.png","fullname":"Pico Language Model","name":"pico-lm","type":"org","isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":59,"isUserFollowing":false},"downloads":48,"gated":false,"id":"pico-lm/pico-decoder-small","availableInferenceProviders":[],"lastModified":"2025-06-23T14:30:17.000Z","likes":0,"pipeline_tag":"text-generation","private":false,"repoType":"model","isLikedByUser":false,"widgetOutputUrls":[],"numParameters":64595328},{"author":"pico-lm","authorData":{"_id":"66fac26d24f27d02bd9fc4b0","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/638a2e13f32316c0440f5337/9L6Dkxfh2ssI5nPr1WlHY.png","fullname":"Pico Language Model","name":"pico-lm","type":"org","isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":59,"isUserFollowing":false},"downloads":40,"gated":false,"id":"pico-lm/pico-decoder-tiny","availableInferenceProviders":[],"lastModified":"2025-06-23T14:25:29.000Z","likes":3,"pipeline_tag":"text-generation","private":false,"repoType":"model","isLikedByUser":false,"widgetOutputUrls":[],"numParameters":11282784},{"author":"pico-lm","authorData":{"_id":"66fac26d24f27d02bd9fc4b0","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/638a2e13f32316c0440f5337/9L6Dkxfh2ssI5nPr1WlHY.png","fullname":"Pico Language Model","name":"pico-lm","type":"org","isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":59,"isUserFollowing":false},"downloads":0,"gated":false,"id":"pico-lm/demo","availableInferenceProviders":[],"lastModified":"2025-03-25T14:19:07.000Z","likes":0,"private":false,"repoType":"model","isLikedByUser":false}],"paperPreviews":[{"_id":"2509.16413","title":"Pico: A Modular Framework for Hypothesis-Driven Small Language Model\n Research","id":"2509.16413","thumbnailUrl":"https://cdn-thumbnails.huggingface.co/social-thumbnails/papers/2509.16413.png"}],"spaces":[{"author":"pico-lm","authorData":{"_id":"66fac26d24f27d02bd9fc4b0","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/638a2e13f32316c0440f5337/9L6Dkxfh2ssI5nPr1WlHY.png","fullname":"Pico Language Model","name":"pico-lm","type":"org","isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":59,"isUserFollowing":false},"colorFrom":"blue","colorTo":"red","createdAt":"2025-03-10T15:28:45.000Z","emoji":"🎈","id":"pico-lm/blimp","lastModified":"2025-03-20T22:10:08.000Z","likes":0,"pinned":false,"private":false,"sdk":"static","repoType":"space","runtime":{"stage":"RUNNING","hardware":{"current":null,"requested":"cpu-basic"},"storage":null,"gcTimeout":172800,"errorMessage":"Static SDK is not supported","replicas":{"requested":1}},"title":"BLiMP","isLikedByUser":false,"originRepo":{"name":"evaluate-metric/perplexity","author":{"_id":"628225b4590d129165cbfc5f","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1653634596248-5e48005437cb5b49818287a5.png","fullname":"Evaluate Metric","name":"evaluate-metric","type":"org","isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":246,"isUserFollowing":false}},"ai_short_description":"Evaluate language models on English grammar","ai_category":"Language Translation","trendingScore":0,"tags":["static","evaluate","metric","region:us"],"featured":false},{"author":"pico-lm","authorData":{"_id":"66fac26d24f27d02bd9fc4b0","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/638a2e13f32316c0440f5337/9L6Dkxfh2ssI5nPr1WlHY.png","fullname":"Pico Language Model","name":"pico-lm","type":"org","isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":59,"isUserFollowing":false},"colorFrom":"blue","colorTo":"red","createdAt":"2024-12-06T16:53:10.000Z","emoji":"🚀","id":"pico-lm/perplexity","lastModified":"2025-03-18T10:03:57.000Z","likes":1,"pinned":false,"private":false,"sdk":"static","repoType":"space","runtime":{"stage":"RUNNING","hardware":{"current":null,"requested":"cpu-basic"},"storage":null,"gcTimeout":172800,"errorMessage":"Static SDK is not supported","replicas":{"requested":1}},"shortDescription":"Perplexity computation (avg. neg log-likelihood)","title":"Perplexity","isLikedByUser":false,"originRepo":{"name":"evaluate-metric/perplexity","author":{"_id":"628225b4590d129165cbfc5f","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1653634596248-5e48005437cb5b49818287a5.png","fullname":"Evaluate Metric","name":"evaluate-metric","type":"org","isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":246,"isUserFollowing":false}},"ai_short_description":"Evaluate language model performance using perplexity","ai_category":"Text Analysis","trendingScore":0,"tags":["static","evaluate","metric","region:us"],"featured":false}],"buckets":[],"numBuckets":0,"numDatasets":4,"numModels":5,"numSpaces":3,"lastOrgActivities":[{"time":"2026-02-20T14:46:41.504Z","user":"rdiehlmartinez","userAvatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1670000094887-noauth.jpeg","orgAvatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/638a2e13f32316c0440f5337/9L6Dkxfh2ssI5nPr1WlHY.png","type":"update","repoData":{"author":"pico-lm","authorData":{"_id":"66fac26d24f27d02bd9fc4b0","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/638a2e13f32316c0440f5337/9L6Dkxfh2ssI5nPr1WlHY.png","fullname":"Pico Language Model","name":"pico-lm","type":"org","isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":59,"isUserFollowing":false},"colorFrom":"red","colorTo":"yellow","createdAt":"2024-11-21T17:23:07.000Z","emoji":"📈","id":"pico-lm/README","lastModified":"2026-02-20T14:46:40.000Z","likes":0,"pinned":true,"private":false,"sdk":"static","repoType":"space","runtime":{"stage":"RUNNING","hardware":{"current":null,"requested":null},"storage":null,"replicas":{"requested":1,"current":1}},"title":"README","isLikedByUser":false,"trendingScore":0,"tags":["static","region:us"],"featured":false},"repoId":"pico-lm/README","repoType":"space","org":"pico-lm"},{"time":"2025-10-24T15:58:02.922Z","user":"suchirsalhan","userAvatarUrl":"","type":"paper","paper":{"id":"2510.19419","title":"BLiSS 1.0: Evaluating Bilingual Learner Competence in Second Language\n Small Language Models","publishedAt":"2025-10-22T09:42:01.000Z","upvotes":1,"isUpvotedByUser":true}},{"time":"2025-10-24T15:58:00.391Z","user":"suchirsalhan","userAvatarUrl":"","type":"paper","paper":{"id":"2510.19493","title":"What is the Best Sequence Length for BABYLM?","publishedAt":"2025-10-22T11:42:33.000Z","upvotes":1,"isUpvotedByUser":true}}],"acceptLanguages":["*"],"canReadRepos":false,"canReadSpaces":false,"blogPosts":[],"currentRepoPage":0,"filters":{},"paperView":false}">
Pico: A Modular Framework for Hypothesis-Driven Small Language Model Research
Welcome to Pico LM 👋, a research initiative dedicated to demystifying language model learning.
We create two complementary frameworks (pico-train and pico-analyze) for training and analyzing small to mid-scale language models (1M–1B parameters). Our mission is to provide a transparent, research-oriented workflow that illuminates how these models learn.
For full documentation and code, visit our two main repositories:
- pico-train: Minimalist training framework for language models.
- pico-analyze: Tools for measuring and visualizing model learning dynamics across checkpoints.
This HuggingFace organization hosts our pre-trained models and datasets, while the GitHub repository provides the code to train and analyze your own model suites from scratch.
All code and artifacts are licensed under a permissive Apache-2.0 license.
Pro Tip 🚀 :
To learn more about these libraries and explore detailed tutorials, visit our official website picolm.io and get fully acquainted with the Pico ecosystem.
🤗 HuggingFace Resources (You Are Here)
1. Pre-trained Model Suite
Our complete suite of models from 11M to 570M parameters trained with Pico:
🚧 Disclaimer These models are still under construction. The models released in this repository have been trained for 125,000 steps (corresponding to ~250B tokens). Training will finalize after 200,000 steps.
🚧 Coming Soon! pico-decoder-xl (1B+ parameters) Watch this space or star our GitHub repository for updates!
All models are on the pretokenized-dolma dataset. They all see the same training data at each training step, use the same optimizatation process, and share the same model architecture; the only difference between models is the size of their hidden dimension.
In each model repository, we version control checkpoints every 1000 steps that contain:
- Weights and optimizer states (HuggingFace and Lightning Fabric-compatible versions)
- Model activations and gradients
- The batch of training data observed at the given training step
We visualize the learning process in our Wandb.
Model Details:
| Aspect |
Details |
| Architecture |
- Llama-style transformer (decoder-only) - RMSNorm normalization - RoPE (Rotary Positional Embeddings) - Multi-head attention with KV-cache - SwiGLU activation function |
| Sequence Length |
2048 |
| Batch Size |
1024 |
| Optimizer |
AdamW |
| Learning Rate |
3e-4 (one-cycle warmup) |
| Gradient Clipping |
1.0 |
| Precision |
Mixed precision training |
| Vocabulary Size |
50,280 |
2. Datasets
pretokenized-dolma
- 420B tokens of pre-processed, tokenized and shuffled text extraced from the DOLMA corpus
- We use this dataset to train our model suite
pretokenized-dolma-tinsy
- A smaller version of the pretokenized-dolma corpus for quick experiments
pretokenized-paloma
- A tokenized and shuffled version of the Paloma evaluation corpus
- The Paloma corpus was carefully curated to be disjoint from the Dolma corpus and provides
- We use this corpus to evaluate the perplexity of our models
pretokenized-paloma-tinsy
- A sub-sampled version of the pretokenized-dolma corpus
All datasets are tokenized using the OLMo Tokenizer
🔍 Citation
If you use Pico in academic or professional work, please cite it:
@inproceedings{diehl-martinez-etal-2025-pico,
title = "Pico: A Modular Framework for Hypothesis-Driven Small Language Model Research",
author = "Diehl Martinez, Richard and
Africa, David Demitri and
Weiss, Yuval and
Salhan, Suchir and
Daniels, Ryan and
Buttery, Paula",
editor = {Habernal, Ivan and
Schulam, Peter and
Tiedemann, J{\"o}rg},
booktitle = "Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: System Demonstrations",
month = nov,
year = "2025",
address = "Suzhou, China",
publisher = "Association for Computational Linguistics",
}
Thanks for checking out Pico!
Star our GitHub repositories or join our community discussions to stay updated. If you find a bug or have questions, open an issue—contributions are welcome!