Pico Language Model

university

https://www.picolm.io

pico-train: Minimalist training framework for language models. \n

pico-analyze: Tools for measuring and visualizing model learning dynamics across checkpoints.

\n\n

This HuggingFace organization hosts our pre-trained models and datasets, while the GitHub repository provides the code to train and analyze your own model suites from scratch.

All code and artifacts are licensed under a permissive Apache-2.0 license.

\n
Pro Tip 🚀 : \nTo learn more about these libraries and explore detailed tutorials, visit our official website picolm.io and get fully acquainted with the Pico ecosystem.
\n

🤗 HuggingFace Resources (You Are Here)

1. Pre-trained Model Suite

Our complete suite of models from 11M to 570M parameters trained with Pico:

pico-decoder-tiny (11M parameters)
pico-decoder-small (65M parameters)
pico-decoder-medium (181M parameters)
pico-decoder-large (570M parameters)

\n
🚧 Disclaimer These models are still under construction. The models released in this repository have been trained for 125,000 steps (corresponding to ~250B tokens). Training will finalize after 200,000 steps.
\n
🚧 Coming Soon! pico-decoder-xl (1B+ parameters) Watch this space or star our GitHub repository for updates!
\n

All models are on the pretokenized-dolma dataset. They all see the same training data at each training step, use the same optimizatation process, and share the same model architecture; the only difference between models is the size of their hidden dimension.

In each model repository, we version control checkpoints every 1000 steps that contain:

Weights and optimizer states (HuggingFace and Lightning Fabric-compatible versions)
Model activations and gradients
The batch of training data observed at the given training step

We visualize the learning process in our Wandb.

Model Details:

\n\t\n\t\t\n\n\n\n\n\t\t\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\t

Aspect	Details
Architecture	- Llama-style transformer (decoder-only) - RMSNorm normalization - RoPE (Rotary Positional Embeddings) - Multi-head attention with KV-cache - SwiGLU activation function
Sequence Length	2048
Batch Size	1024
Optimizer	AdamW
Learning Rate	3e-4 (one-cycle warmup)
Gradient Clipping	1.0
Precision	Mixed precision training
Vocabulary Size	50,280

2. Datasets

pretokenized-dolma
\n
- 420B tokens of pre-processed, tokenized and shuffled text extraced from the DOLMA corpus
- We use this dataset to train our model suite
\n
pretokenized-dolma-tinsy
\n
- A smaller version of the pretokenized-dolma corpus for quick experiments
\n
pretokenized-paloma
\n
- A tokenized and shuffled version of the Paloma evaluation corpus
- The Paloma corpus was carefully curated to be disjoint from the Dolma corpus and provides
- We use this corpus to evaluate the perplexity of our models
\n
pretokenized-paloma-tinsy
\n
- A sub-sampled version of the pretokenized-dolma corpus
\n

All datasets are tokenized using the OLMo Tokenizer

🔍 Citation

If you use Pico in academic or professional work, please cite it:

@inproceedings{diehl-martinez-etal-2025-pico,\n    title = \"Pico: A Modular Framework for Hypothesis-Driven Small Language Model Research\",\n    author = \"Diehl Martinez, Richard  and\n      Africa, David Demitri  and\n      Weiss, Yuval  and\n      Salhan, Suchir  and\n      Daniels, Ryan  and\n      Buttery, Paula\",\n    editor = {Habernal, Ivan  and\n      Schulam, Peter  and\n      Tiedemann, J{\\\"o}rg},\n    booktitle = \"Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: System Demonstrations\",\n    month = nov,\n    year = \"2025\",\n    address = \"Suzhou, China\",\n    publisher = \"Association for Computational Linguistics\",\n}\n

Thanks for checking out Pico!
Star our GitHub repositories or join our community discussions to stay updated. If you find a bug or have questions, open an issue—contributions are welcome!

\n","classNames":"hf-sanitized hf-sanitized-RoXeAtGQNffBMP4G6blHt"},"users":[{"_id":"638a2e13f32316c0440f5337","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1670000094887-noauth.jpeg","isPro":false,"fullname":"Richard Diehl Martinez","user":"rdiehlmartinez","type":"user"},{"_id":"671fb36b2944e609b720f0bc","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/671fb36b2944e609b720f0bc/l4LXhg5_8jMFMmY0J_3zP.jpeg","isPro":false,"fullname":"David Demitri Africa","user":"davidafrica","type":"user"},{"_id":"6720e4040dfc040cad555864","avatarUrl":"/avatars/ddc9440a1a5097ecce69c87f9d2c91f7.svg","isPro":false,"fullname":"Yuval Weiss","user":"yuvalw","type":"user"},{"_id":"65cb5046632959d663c5e2c8","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/65cb5046632959d663c5e2c8/Db6sRavVXqDOsYAxw-4Mq.jpeg","isPro":false,"fullname":"Suchir Salhan","user":"suchirsalhan","type":"user"}],"userCount":4,"collections":[{"slug":"pico-lm/pico-decoder-model-suite-67d9bf4f27815a6146e4e366","title":"Pico Decoder Model Suite","description":"Pico Decoder models (10M-500M)","gating":false,"lastUpdated":"2025-04-22T15:54:14.482Z","owner":{"_id":"66fac26d24f27d02bd9fc4b0","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/638a2e13f32316c0440f5337/9L6Dkxfh2ssI5nPr1WlHY.png","fullname":"Pico Language Model","name":"pico-lm","type":"org","isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":59,"isUserFollowing":false},"items":[{"_id":"67d9bfa5968288a47886f489","position":0,"type":"model","author":"pico-lm","authorData":{"_id":"66fac26d24f27d02bd9fc4b0","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/638a2e13f32316c0440f5337/9L6Dkxfh2ssI5nPr1WlHY.png","fullname":"Pico Language Model","name":"pico-lm","type":"org","isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":59,"isUserFollowing":false},"downloads":40,"gated":false,"id":"pico-lm/pico-decoder-tiny","availableInferenceProviders":[],"lastModified":"2025-06-23T14:25:29.000Z","likes":3,"pipeline_tag":"text-generation","private":false,"repoType":"model","isLikedByUser":false,"widgetOutputUrls":[],"numParameters":11282784},{"_id":"67d9bfa0426aabf9930e274c","position":1,"type":"model","author":"pico-lm","authorData":{"_id":"66fac26d24f27d02bd9fc4b0","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/638a2e13f32316c0440f5337/9L6Dkxfh2ssI5nPr1WlHY.png","fullname":"Pico Language Model","name":"pico-lm","type":"org","isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":59,"isUserFollowing":false},"downloads":48,"gated":false,"id":"pico-lm/pico-decoder-small","availableInferenceProviders":[],"lastModified":"2025-06-23T14:30:17.000Z","likes":0,"pipeline_tag":"text-generation","private":false,"repoType":"model","isLikedByUser":false,"widgetOutputUrls":[],"numParameters":64595328},{"_id":"67d9bf9a7a087207dfd5f5ab","position":2,"type":"model","author":"pico-lm","authorData":{"_id":"66fac26d24f27d02bd9fc4b0","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/638a2e13f32316c0440f5337/9L6Dkxfh2ssI5nPr1WlHY.png","fullname":"Pico Language Model","name":"pico-lm","type":"org","isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":59,"isUserFollowing":false},"downloads":42,"gated":false,"id":"pico-lm/pico-decoder-medium","availableInferenceProviders":[],"lastModified":"2025-06-23T15:02:24.000Z","likes":0,"pipeline_tag":"text-generation","private":false,"repoType":"model","isLikedByUser":false,"widgetOutputUrls":[],"numParameters":181095168},{"_id":"67d9bfab53bc6894db9604bd","position":3,"type":"model","author":"pico-lm","authorData":{"_id":"66fac26d24f27d02bd9fc4b0","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/638a2e13f32316c0440f5337/9L6Dkxfh2ssI5nPr1WlHY.png","fullname":"Pico Language Model","name":"pico-lm","type":"org","isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":59,"isUserFollowing":false},"downloads":94,"gated":false,"id":"pico-lm/pico-decoder-large","availableInferenceProviders":[],"lastModified":"2025-06-23T17:49:10.000Z","likes":0,"pipeline_tag":"text-generation","private":false,"repoType":"model","isLikedByUser":false,"widgetOutputUrls":[],"numParameters":569808384}],"position":1,"theme":"blue","private":false,"shareUrl":"https://hf.co/collections/pico-lm/pico-decoder-model-suite","upvotes":1,"isUpvotedByUser":false},{"slug":"pico-lm/evaluation-data-67a2341b13af573d67b71833","title":"Evaluation Data","description":"","gating":false,"lastUpdated":"2025-02-04T15:37:17.598Z","owner":{"_id":"66fac26d24f27d02bd9fc4b0","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/638a2e13f32316c0440f5337/9L6Dkxfh2ssI5nPr1WlHY.png","fullname":"Pico Language Model","name":"pico-lm","type":"org","isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":59,"isUserFollowing":false},"items":[{"_id":"67a23426cfbd70620c62ec11","position":0,"type":"dataset","author":"pico-lm","downloads":4,"gated":false,"id":"pico-lm/pretokenized-paloma","lastModified":"2025-04-16T10:52:18.000Z","datasetsServerInfo":{"viewer":"viewer","numRows":29016,"libraries":["datasets","pandas","mlcroissant","polars"],"formats":["parquet"],"modalities":["text"]},"private":false,"repoType":"dataset","likes":0,"isLikedByUser":false,"isBenchmark":false},{"_id":"67a2342d583cabcbf8cf6b2a","position":1,"type":"dataset","author":"pico-lm","downloads":1460,"gated":false,"id":"pico-lm/pretokenized-paloma-tinsy","lastModified":"2025-04-16T10:50:54.000Z","datasetsServerInfo":{"viewer":"viewer","numRows":1435,"libraries":["datasets","pandas","mlcroissant","polars"],"formats":["parquet"],"modalities":["text"]},"private":false,"repoType":"dataset","likes":0,"isLikedByUser":false,"isBenchmark":false}],"position":2,"theme":"blue","private":false,"shareUrl":"https://hf.co/collections/pico-lm/evaluation-data","upvotes":0,"isUpvotedByUser":false},{"slug":"pico-lm/training-data-67a0f33e12f3b77287eb7de4","title":"Training Data","description":"","gating":false,"lastUpdated":"2025-02-04T15:36:05.567Z","owner":{"_id":"66fac26d24f27d02bd9fc4b0","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/638a2e13f32316c0440f5337/9L6Dkxfh2ssI5nPr1WlHY.png","fullname":"Pico Language Model","name":"pico-lm","type":"org","isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":59,"isUserFollowing":false},"items":[{"_id":"67a10f734ec16be0a9b1e7a7","position":1,"type":"dataset","author":"pico-lm","downloads":13,"gated":false,"id":"pico-lm/pretokenized-dolma-tinsy","lastModified":"2025-04-16T10:45:46.000Z","datasetsServerInfo":{"viewer":"viewer","numRows":7020,"libraries":["datasets","dask","mlcroissant","polars"],"formats":["parquet"],"modalities":[]},"private":false,"repoType":"dataset","likes":0,"isLikedByUser":false,"isBenchmark":false},{"_id":"67a233e5dc58ef0f94fc4dc9","position":2,"type":"dataset","author":"pico-lm","downloads":583,"gated":false,"id":"pico-lm/pretokenized-dolma","lastModified":"2025-04-16T10:43:37.000Z","datasetsServerInfo":{"viewer":"viewer","numRows":204800000,"libraries":["datasets","dask","mlcroissant","polars"],"formats":["parquet"],"modalities":[]},"private":false,"repoType":"dataset","likes":4,"isLikedByUser":false,"isBenchmark":false}],"position":3,"theme":"green","private":false,"shareUrl":"https://hf.co/collections/pico-lm/training-data","upvotes":0,"isUpvotedByUser":false}],"datasets":[{"author":"pico-lm","downloads":4,"gated":false,"id":"pico-lm/pretokenized-paloma","lastModified":"2025-04-16T10:52:18.000Z","datasetsServerInfo":{"viewer":"viewer","numRows":29016,"libraries":["datasets","pandas","mlcroissant","polars"],"formats":["parquet"],"modalities":["text"]},"private":false,"repoType":"dataset","likes":0,"isLikedByUser":false,"isBenchmark":false},{"author":"pico-lm","downloads":1460,"gated":false,"id":"pico-lm/pretokenized-paloma-tinsy","lastModified":"2025-04-16T10:50:54.000Z","datasetsServerInfo":{"viewer":"viewer","numRows":1435,"libraries":["datasets","pandas","mlcroissant","polars"],"formats":["parquet"],"modalities":["text"]},"private":false,"repoType":"dataset","likes":0,"isLikedByUser":false,"isBenchmark":false},{"author":"pico-lm","downloads":13,"gated":false,"id":"pico-lm/pretokenized-dolma-tinsy","lastModified":"2025-04-16T10:45:46.000Z","datasetsServerInfo":{"viewer":"viewer","numRows":7020,"libraries":["datasets","dask","mlcroissant","polars"],"formats":["parquet"],"modalities":[]},"private":false,"repoType":"dataset","likes":0,"isLikedByUser":false,"isBenchmark":false},{"author":"pico-lm","downloads":583,"gated":false,"id":"pico-lm/pretokenized-dolma","lastModified":"2025-04-16T10:43:37.000Z","datasetsServerInfo":{"viewer":"viewer","numRows":204800000,"libraries":["datasets","dask","mlcroissant","polars"],"formats":["parquet"],"modalities":[]},"private":false,"repoType":"dataset","likes":4,"isLikedByUser":false,"isBenchmark":false}],"models":[{"author":"pico-lm","authorData":{"_id":"66fac26d24f27d02bd9fc4b0","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/638a2e13f32316c0440f5337/9L6Dkxfh2ssI5nPr1WlHY.png","fullname":"Pico Language Model","name":"pico-lm","type":"org","isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":59,"isUserFollowing":false},"downloads":94,"gated":false,"id":"pico-lm/pico-decoder-large","availableInferenceProviders":[],"lastModified":"2025-06-23T17:49:10.000Z","likes":0,"pipeline_tag":"text-generation","private":false,"repoType":"model","isLikedByUser":false,"widgetOutputUrls":[],"numParameters":569808384},{"author":"pico-lm","authorData":{"_id":"66fac26d24f27d02bd9fc4b0","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/638a2e13f32316c0440f5337/9L6Dkxfh2ssI5nPr1WlHY.png","fullname":"Pico Language Model","name":"pico-lm","type":"org","isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":59,"isUserFollowing":false},"downloads":42,"gated":false,"id":"pico-lm/pico-decoder-medium","availableInferenceProviders":[],"lastModified":"2025-06-23T15:02:24.000Z","likes":0,"pipeline_tag":"text-generation","private":false,"repoType":"model","isLikedByUser":false,"widgetOutputUrls":[],"numParameters":181095168},{"author":"pico-lm","authorData":{"_id":"66fac26d24f27d02bd9fc4b0","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/638a2e13f32316c0440f5337/9L6Dkxfh2ssI5nPr1WlHY.png","fullname":"Pico Language Model","name":"pico-lm","type":"org","isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":59,"isUserFollowing":false},"downloads":48,"gated":false,"id":"pico-lm/pico-decoder-small","availableInferenceProviders":[],"lastModified":"2025-06-23T14:30:17.000Z","likes":0,"pipeline_tag":"text-generation","private":false,"repoType":"model","isLikedByUser":false,"widgetOutputUrls":[],"numParameters":64595328},{"author":"pico-lm","authorData":{"_id":"66fac26d24f27d02bd9fc4b0","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/638a2e13f32316c0440f5337/9L6Dkxfh2ssI5nPr1WlHY.png","fullname":"Pico Language Model","name":"pico-lm","type":"org","isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":59,"isUserFollowing":false},"downloads":40,"gated":false,"id":"pico-lm/pico-decoder-tiny","availableInferenceProviders":[],"lastModified":"2025-06-23T14:25:29.000Z","likes":3,"pipeline_tag":"text-generation","private":false,"repoType":"model","isLikedByUser":false,"widgetOutputUrls":[],"numParameters":11282784},{"author":"pico-lm","authorData":{"_id":"66fac26d24f27d02bd9fc4b0","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/638a2e13f32316c0440f5337/9L6Dkxfh2ssI5nPr1WlHY.png","fullname":"Pico Language Model","name":"pico-lm","type":"org","isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":59,"isUserFollowing":false},"downloads":0,"gated":false,"id":"pico-lm/demo","availableInferenceProviders":[],"lastModified":"2025-03-25T14:19:07.000Z","likes":0,"private":false,"repoType":"model","isLikedByUser":false}],"paperPreviews":[{"_id":"2509.16413","title":"Pico: A Modular Framework for Hypothesis-Driven Small Language Model\n Research","id":"2509.16413","thumbnailUrl":"https://cdn-thumbnails.huggingface.co/social-thumbnails/papers/2509.16413.png"}],"spaces":[{"author":"pico-lm","authorData":{"_id":"66fac26d24f27d02bd9fc4b0","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/638a2e13f32316c0440f5337/9L6Dkxfh2ssI5nPr1WlHY.png","fullname":"Pico Language Model","name":"pico-lm","type":"org","isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":59,"isUserFollowing":false},"colorFrom":"blue","colorTo":"red","createdAt":"2025-03-10T15:28:45.000Z","emoji":"🎈","id":"pico-lm/blimp","lastModified":"2025-03-20T22:10:08.000Z","likes":0,"pinned":false,"private":false,"sdk":"static","repoType":"space","runtime":{"stage":"RUNNING","hardware":{"current":null,"requested":"cpu-basic"},"storage":null,"gcTimeout":172800,"errorMessage":"Static SDK is not supported","replicas":{"requested":1}},"title":"BLiMP","isLikedByUser":false,"originRepo":{"name":"evaluate-metric/perplexity","author":{"_id":"628225b4590d129165cbfc5f","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1653634596248-5e48005437cb5b49818287a5.png","fullname":"Evaluate Metric","name":"evaluate-metric","type":"org","isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":246,"isUserFollowing":false}},"ai_short_description":"Evaluate language models on English grammar","ai_category":"Language Translation","trendingScore":0,"tags":["static","evaluate","metric","region:us"],"featured":false},{"author":"pico-lm","authorData":{"_id":"66fac26d24f27d02bd9fc4b0","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/638a2e13f32316c0440f5337/9L6Dkxfh2ssI5nPr1WlHY.png","fullname":"Pico Language Model","name":"pico-lm","type":"org","isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":59,"isUserFollowing":false},"colorFrom":"blue","colorTo":"red","createdAt":"2024-12-06T16:53:10.000Z","emoji":"🚀","id":"pico-lm/perplexity","lastModified":"2025-03-18T10:03:57.000Z","likes":1,"pinned":false,"private":false,"sdk":"static","repoType":"space","runtime":{"stage":"RUNNING","hardware":{"current":null,"requested":"cpu-basic"},"storage":null,"gcTimeout":172800,"errorMessage":"Static SDK is not supported","replicas":{"requested":1}},"shortDescription":"Perplexity computation (avg. neg log-likelihood)","title":"Perplexity","isLikedByUser":false,"originRepo":{"name":"evaluate-metric/perplexity","author":{"_id":"628225b4590d129165cbfc5f","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1653634596248-5e48005437cb5b49818287a5.png","fullname":"Evaluate Metric","name":"evaluate-metric","type":"org","isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":246,"isUserFollowing":false}},"ai_short_description":"Evaluate language model performance using perplexity","ai_category":"Text Analysis","trendingScore":0,"tags":["static","evaluate","metric","region:us"],"featured":false}],"buckets":[],"numBuckets":0,"numDatasets":4,"numModels":5,"numSpaces":3,"lastOrgActivities":[{"time":"2026-02-20T14:46:41.504Z","user":"rdiehlmartinez","userAvatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1670000094887-noauth.jpeg","orgAvatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/638a2e13f32316c0440f5337/9L6Dkxfh2ssI5nPr1WlHY.png","type":"update","repoData":{"author":"pico-lm","authorData":{"_id":"66fac26d24f27d02bd9fc4b0","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/638a2e13f32316c0440f5337/9L6Dkxfh2ssI5nPr1WlHY.png","fullname":"Pico Language Model","name":"pico-lm","type":"org","isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":59,"isUserFollowing":false},"colorFrom":"red","colorTo":"yellow","createdAt":"2024-11-21T17:23:07.000Z","emoji":"📈","id":"pico-lm/README","lastModified":"2026-02-20T14:46:40.000Z","likes":0,"pinned":true,"private":false,"sdk":"static","repoType":"space","runtime":{"stage":"RUNNING","hardware":{"current":null,"requested":null},"storage":null,"replicas":{"requested":1,"current":1}},"title":"README","isLikedByUser":false,"trendingScore":0,"tags":["static","region:us"],"featured":false},"repoId":"pico-lm/README","repoType":"space","org":"pico-lm"},{"time":"2025-10-24T15:58:02.922Z","user":"suchirsalhan","userAvatarUrl":"","type":"paper","paper":{"id":"2510.19419","title":"BLiSS 1.0: Evaluating Bilingual Learner Competence in Second Language\n Small Language Models","publishedAt":"2025-10-22T09:42:01.000Z","upvotes":1,"isUpvotedByUser":true}},{"time":"2025-10-24T15:58:00.391Z","user":"suchirsalhan","userAvatarUrl":"","type":"paper","paper":{"id":"2510.19493","title":"What is the Best Sequence Length for BABYLM?","publishedAt":"2025-10-22T11:42:33.000Z","upvotes":1,"isUpvotedByUser":true}}],"acceptLanguages":["*"],"canReadRepos":false,"canReadSpaces":false,"blogPosts":[],"currentRepoPage":0,"filters":{},"paperView":false}">

AI & ML interests

None defined yet.

Recent Activity

rdiehlmartinez updated a Space 25 minutes ago

pico-lm/README

suchirsalhan authored a paper 4 months ago

BLiSS 1.0: Evaluating Bilingual Learner Competence in Second Language Small Language Models

suchirsalhan authored a paper 4 months ago

What is the Best Sequence Length for BABYLM?

View all activity

Papers

Pico: A Modular Framework for Hypothesis-Driven Small Language Model Research

View all Papers

Organization Card

Community About org cards