Deprecated: The each() function is deprecated. This message will be suppressed on further calls in /home/zhenxiangba/zhenxiangba.com/public_html/phproxy-improved-master/index.php on line 456
nanotron (Nanotron Research)
[go: Go Back, main page]

Activity Feed
\n\t\t\t\t\t
\n\t\t\t\t\t
\n\t\t\t
\n\t\t\t\t

The Ultra-Scale Playbook: Training LLMs on GPU Clusters\n\t\t\t\t

\n\n\t\t\t\t
Essential reading for anyone scaling ML infrastructure
\n\t\t\t\t
The knowledge on how to efficiently scale training to large GPU clusters has been well kept within a handful\n\t\t\t\t\tbig industry labs. With this book, we set out to lift the veil and release a comprehensive resource on\n\t\t\t\t\tdistributed training.\n\t\t\t\t
\n\t\t\t\t
AUTHORS
\n\t\t\t\t\t\tNouamane Tazi, Ferdinand Mom, Haojun Zhao, Phuc Nguyen, Mohamed Mekkouri, Leandro von Werra, Thomas Wolf\n\t\t\t\t\t
\n\t\t\t\t\t
AFFILIATION
\n\t\t\t\t\t\tHugging Face\n\t\t\t\t\t
\n\t\t\t\t\t
PUBLISHED
\n\t\t\t\t\t\tJul 30, 2025\n\t\t\t\t\t
\n\t\t\t\t
\n\t\t\t\t
This book PDF is accessible with\n\t\t\t\t\ta\n\t\t\t\t\tPRO\n subscription.\n\t\t\t\t
\n\n\t\t\t\t
\n\t\t\t\t\tSubscribe to PRO\n\t\t\t\t\t\t
\n\n

(*If you experience issues downloading the PDF with Chrome try restarting/updating or use a different browser)

\n\n

The Nanotron team focus on sharing open knowledge and developping open-source libraries for efficient distributed training of large-scale AI models.

\n

Some of its contributions are:

\n\n","classNames":"hf-sanitized hf-sanitized-bDS_vTv85hgBXsClhruyI"},"users":[{"_id":"5df7e9e5da6d0311fd3d53f9","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1583857746553-5df7e9e5da6d0311fd3d53f9.jpeg","isPro":true,"fullname":"Thomas Wolf","user":"thomwolf","type":"user"},{"_id":"5ff8c9f4b2035d9a81a859f7","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1652134289581-5ff8c9f4b2035d9a81a859f7.jpeg","isPro":false,"fullname":"Nouamane Tazi","user":"nouamanetazi","type":"user"},{"_id":"61c141342aac764ce1654e43","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/61c141342aac764ce1654e43/81AwoT5IQ_Xdw0OVw7TKu.jpeg","isPro":false,"fullname":"Loubna Ben Allal","user":"loubnabnl","type":"user"},{"_id":"64622c4093f702673bf9b953","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/noauth/iB42uASF1DoQth23cizSh.png","isPro":true,"fullname":"Ferdinand Mom","user":"3outeille","type":"user"},{"_id":"63e0eea7af523c37e5a77966","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1678663263366-63e0eea7af523c37e5a77966.jpeg","isPro":true,"fullname":"Nathan Habib","user":"SaylorTwift","type":"user"},{"_id":"5e48005437cb5b49818287a5","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/5e48005437cb5b49818287a5/4uCXGGui-9QifAT4qelxU.png","isPro":false,"fullname":"Leandro von Werra","user":"lvwerra","type":"user"},{"_id":"651e96991b97c9f33d26bde6","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/651e96991b97c9f33d26bde6/-Bqs6qrmz0yCfwtB2e-6q.jpeg","isPro":true,"fullname":"Elie Bakouch","user":"eliebak","type":"user"},{"_id":"6632d7e22c4f4bfc3f6a05c2","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6632d7e22c4f4bfc3f6a05c2/TCDlrb-O5aormNjSX-tyE.png","isPro":false,"fullname":"Mohamed Mekkouri","user":"medmekk","type":"user"},{"_id":"5dd96eb166059660ed1ee413","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/5dd96eb166059660ed1ee413/NQtzmrDdbG0H8qkZvRyGk.jpeg","isPro":true,"fullname":"Julien Chaumond","user":"julien-c","type":"user"},{"_id":"64cb7fdb9e30a46f7b92aa45","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/64cb7fdb9e30a46f7b92aa45/TKaRtn_-R_W__QY8DsQv3.jpeg","isPro":true,"fullname":"frere thibaud","user":"tfrere","type":"user"}],"userCount":10,"collections":[],"datasets":[{"author":"nanotron","downloads":76,"gated":"auto","id":"nanotron/book","lastModified":"2025-07-30T14:52:29.000Z","private":false,"repoType":"dataset","likes":5,"isLikedByUser":false,"isBenchmark":false},{"author":"nanotron","downloads":127,"gated":false,"id":"nanotron/ultrascale-playbook-data","lastModified":"2025-03-12T17:28:45.000Z","private":false,"repoType":"dataset","likes":7,"isLikedByUser":false,"isBenchmark":false},{"author":"nanotron","downloads":51,"gated":false,"id":"nanotron/picotron_bench","lastModified":"2024-12-17T04:58:41.000Z","datasetsServerInfo":{"viewer":"viewer","numRows":740,"libraries":["datasets","dask","mlcroissant","polars"],"formats":["csv"],"modalities":["tabular","text"]},"private":false,"repoType":"dataset","likes":2,"isLikedByUser":false,"isBenchmark":false},{"author":"nanotron","downloads":2610,"gated":false,"id":"nanotron/minipile_100_samples","lastModified":"2024-07-10T04:02:49.000Z","datasetsServerInfo":{"viewer":"viewer","numRows":100,"libraries":["datasets","pandas","mlcroissant","polars"],"formats":["parquet"],"modalities":["text"]},"private":false,"repoType":"dataset","likes":2,"isLikedByUser":false,"isBenchmark":false},{"author":"nanotron","downloads":20,"gated":false,"id":"nanotron/llama3-1024-passkey-retrieval-eval","lastModified":"2024-07-04T13:18:59.000Z","datasetsServerInfo":{"viewer":"viewer","numRows":12600,"libraries":["datasets","pandas","mlcroissant","polars"],"formats":["parquet"],"modalities":["tabular","text"]},"private":false,"repoType":"dataset","likes":2,"isLikedByUser":false,"isBenchmark":false},{"author":"nanotron","downloads":28,"gated":false,"id":"nanotron/llama3-16k-passkey-retrieval-finetuning","lastModified":"2024-06-20T10:09:27.000Z","datasetsServerInfo":{"viewer":"viewer","numRows":77250,"libraries":["datasets","dask","mlcroissant","polars"],"formats":["parquet"],"modalities":["tabular","text"]},"private":false,"repoType":"dataset","likes":1,"isLikedByUser":false,"isBenchmark":false},{"author":"nanotron","downloads":20,"gated":false,"id":"nanotron/llama3-16k-passkey-retrieval-eval","lastModified":"2024-06-19T09:45:12.000Z","datasetsServerInfo":{"viewer":"viewer","numRows":712,"libraries":["datasets","pandas","mlcroissant","polars"],"formats":["parquet"],"modalities":["tabular","text"]},"private":false,"repoType":"dataset","likes":1,"isLikedByUser":false,"isBenchmark":false},{"author":"nanotron","downloads":9,"gated":false,"id":"nanotron/llama3_needle_16k_finetuning","lastModified":"2024-06-15T12:37:13.000Z","datasetsServerInfo":{"viewer":"viewer","numRows":3570,"libraries":["datasets","pandas","mlcroissant","polars"],"formats":["parquet"],"modalities":["tabular","text"]},"private":false,"repoType":"dataset","likes":1,"isLikedByUser":false,"isBenchmark":false},{"author":"nanotron","downloads":9,"gated":false,"id":"nanotron/needle_32k_eval_dataset","lastModified":"2024-05-29T12:44:17.000Z","datasetsServerInfo":{"viewer":"viewer","numRows":1785,"libraries":["datasets","pandas","mlcroissant","polars"],"formats":["parquet"],"modalities":["tabular","text"]},"private":false,"repoType":"dataset","likes":2,"isLikedByUser":false,"isBenchmark":false},{"author":"nanotron","downloads":162,"gated":false,"id":"nanotron/needle_32k_finetuning_dataset","lastModified":"2024-05-16T06:01:30.000Z","datasetsServerInfo":{"viewer":"viewer","numRows":35500,"libraries":["datasets","dask","mlcroissant","polars"],"formats":["parquet"],"modalities":["tabular","text"]},"private":false,"repoType":"dataset","likes":1,"isLikedByUser":false,"isBenchmark":false}],"models":[{"author":"nanotron","authorData":{"_id":"65af989ede38fbe922da644e","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/5df7e9e5da6d0311fd3d53f9/qAagSltOINhPaSgZe7roz.png","fullname":"Nanotron Research","name":"nanotron","type":"org","isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":835,"isUserFollowing":false},"downloads":0,"gated":false,"id":"nanotron/temp_for_pr_review","availableInferenceProviders":[],"lastModified":"2024-09-24T19:40:59.000Z","likes":1,"private":false,"repoType":"model","isLikedByUser":false},{"author":"nanotron","authorData":{"_id":"65af989ede38fbe922da644e","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/5df7e9e5da6d0311fd3d53f9/qAagSltOINhPaSgZe7roz.png","fullname":"Nanotron Research","name":"nanotron","type":"org","isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":835,"isUserFollowing":false},"downloads":0,"gated":false,"id":"nanotron/fp8_for_nanotron","availableInferenceProviders":[],"lastModified":"2024-09-21T13:44:56.000Z","likes":1,"private":false,"repoType":"model","isLikedByUser":false},{"author":"nanotron","authorData":{"_id":"65af989ede38fbe922da644e","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/5df7e9e5da6d0311fd3d53f9/qAagSltOINhPaSgZe7roz.png","fullname":"Nanotron Research","name":"nanotron","type":"org","isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":835,"isUserFollowing":false},"downloads":1,"gated":false,"id":"nanotron/llama3-8b-infini-attention","availableInferenceProviders":[],"lastModified":"2024-08-05T14:15:16.000Z","likes":5,"private":false,"repoType":"model","isLikedByUser":false},{"author":"nanotron","authorData":{"_id":"65af989ede38fbe922da644e","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/5df7e9e5da6d0311fd3d53f9/qAagSltOINhPaSgZe7roz.png","fullname":"Nanotron Research","name":"nanotron","type":"org","isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":835,"isUserFollowing":false},"downloads":0,"gated":false,"id":"nanotron/bench_cluster_epfl","availableInferenceProviders":[],"lastModified":"2024-07-12T21:09:05.000Z","likes":1,"private":false,"repoType":"model","isLikedByUser":false},{"author":"nanotron","authorData":{"_id":"65af989ede38fbe922da644e","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/5df7e9e5da6d0311fd3d53f9/qAagSltOINhPaSgZe7roz.png","fullname":"Nanotron Research","name":"nanotron","type":"org","isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":835,"isUserFollowing":false},"downloads":0,"gated":false,"id":"nanotron/bench_cluster","availableInferenceProviders":[],"lastModified":"2024-07-06T14:20:45.000Z","likes":1,"private":false,"repoType":"model","isLikedByUser":false},{"author":"nanotron","authorData":{"_id":"65af989ede38fbe922da644e","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/5df7e9e5da6d0311fd3d53f9/qAagSltOINhPaSgZe7roz.png","fullname":"Nanotron Research","name":"nanotron","type":"org","isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":835,"isUserFollowing":false},"downloads":0,"gated":false,"id":"nanotron/test","availableInferenceProviders":[],"lastModified":"2024-07-06T13:15:19.000Z","likes":1,"private":false,"repoType":"model","isLikedByUser":false},{"author":"nanotron","authorData":{"_id":"65af989ede38fbe922da644e","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/5df7e9e5da6d0311fd3d53f9/qAagSltOINhPaSgZe7roz.png","fullname":"Nanotron Research","name":"nanotron","type":"org","isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":835,"isUserFollowing":false},"downloads":0,"gated":false,"id":"nanotron/old_bench","availableInferenceProviders":[],"lastModified":"2024-07-06T10:20:32.000Z","likes":4,"private":false,"repoType":"model","isLikedByUser":false},{"author":"nanotron","authorData":{"_id":"65af989ede38fbe922da644e","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/5df7e9e5da6d0311fd3d53f9/qAagSltOINhPaSgZe7roz.png","fullname":"Nanotron Research","name":"nanotron","type":"org","isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":835,"isUserFollowing":false},"downloads":0,"gated":false,"id":"nanotron/minicpm-nanotron","availableInferenceProviders":[],"lastModified":"2024-04-11T15:12:12.000Z","likes":7,"private":false,"repoType":"model","isLikedByUser":false},{"author":"nanotron","authorData":{"_id":"65af989ede38fbe922da644e","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/5df7e9e5da6d0311fd3d53f9/qAagSltOINhPaSgZe7roz.png","fullname":"Nanotron Research","name":"nanotron","type":"org","isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":835,"isUserFollowing":false},"downloads":0,"gated":false,"id":"nanotron/doremi-llama-2.5b-optimized-weights","availableInferenceProviders":[],"lastModified":"2024-02-22T13:34:15.000Z","likes":1,"private":false,"repoType":"model","isLikedByUser":false},{"author":"nanotron","authorData":{"_id":"65af989ede38fbe922da644e","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/5df7e9e5da6d0311fd3d53f9/qAagSltOINhPaSgZe7roz.png","fullname":"Nanotron Research","name":"nanotron","type":"org","isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":835,"isUserFollowing":false},"downloads":0,"gated":false,"id":"nanotron/doremi-llama-2.5b-reference","availableInferenceProviders":[],"lastModified":"2024-02-22T13:18:52.000Z","likes":1,"private":false,"repoType":"model","isLikedByUser":false}],"paperPreviews":[],"spaces":[{"author":"nanotron","authorData":{"_id":"65af989ede38fbe922da644e","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/5df7e9e5da6d0311fd3d53f9/qAagSltOINhPaSgZe7roz.png","fullname":"Nanotron Research","name":"nanotron","type":"org","isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":835,"isUserFollowing":false},"colorFrom":"yellow","colorTo":"purple","createdAt":"2024-06-18T17:12:28.000Z","emoji":"🌌","id":"nanotron/ultrascale-playbook","lastModified":"2025-08-21T12:03:51.000Z","likes":3698,"pinned":true,"private":false,"sdk":"static","repoType":"space","runtime":{"stage":"RUNNING","hardware":{"current":null,"requested":null},"storage":null,"replicas":{"requested":1,"current":1}},"shortDescription":"The ultimate guide to training LLM on large GPU Clusters","title":"The Ultra-Scale Playbook","isLikedByUser":false,"ai_short_description":"Read the Ultra‑Scale Playbook on training LLMs","ai_category":"Other","trendingScore":12,"tags":["static","region:us"],"featured":false},{"author":"nanotron","authorData":{"_id":"65af989ede38fbe922da644e","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/5df7e9e5da6d0311fd3d53f9/qAagSltOINhPaSgZe7roz.png","fullname":"Nanotron Research","name":"nanotron","type":"org","isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":835,"isUserFollowing":false},"colorFrom":"indigo","colorTo":"blue","createdAt":"2025-01-21T14:25:12.000Z","emoji":"🧮","id":"nanotron/predict_memory","lastModified":"2025-03-12T15:00:59.000Z","likes":106,"pinned":true,"private":false,"sdk":"gradio","repoType":"space","runtime":{"stage":"RUNNING","hardware":{"current":"cpu-basic","requested":"cpu-basic"},"storage":null,"gcTimeout":172800,"replicas":{"current":1,"requested":1},"devMode":false,"domains":[{"domain":"nanotron-predict-memory.hf.space","stage":"READY"}],"sha":"5a41adf67ba885da0a46129db20c0e5c6c34a700"},"title":"Predict Memory","isLikedByUser":false,"ai_short_description":"Estimate GPU memory usage for transformer training","ai_category":"Other","trendingScore":0,"tags":["gradio","region:us"],"featured":false}],"buckets":[],"numBuckets":0,"numDatasets":15,"numModels":14,"numSpaces":3,"lastOrgActivities":[{"time":"2026-01-30T16:49:59.657Z","user":"julien-c","userAvatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/5dd96eb166059660ed1ee413/NQtzmrDdbG0H8qkZvRyGk.jpeg","type":"paper-daily","paper":{"id":"2601.21571","title":"Shaping capabilities with token-level data filtering","thumbnailUrl":"https://cdn-thumbnails.huggingface.co/social-thumbnails/papers/2601.21571.png","upvotes":27,"publishedAt":"2026-01-29T11:34:01.000Z","isUpvotedByUser":true}},{"time":"2025-12-19T00:03:56.488Z","user":"eliebak","userAvatarUrl":"","type":"paper-daily","paper":{"id":"2512.14080","title":"SonicMoE: Accelerating MoE with IO and Tile-aware Optimizations","thumbnailUrl":"https://cdn-thumbnails.huggingface.co/social-thumbnails/papers/2512.14080.png","upvotes":9,"publishedAt":"2025-12-16T04:39:10.000Z","isUpvotedByUser":true}},{"time":"2025-10-15T15:24:48.039Z","user":"thomwolf","userAvatarUrl":"","type":"paper","paper":{"id":"2510.12403","title":"Robot Learning: A Tutorial","publishedAt":"2025-10-14T11:36:46.000Z","upvotes":122,"isUpvotedByUser":true}}],"acceptLanguages":["*"],"canReadRepos":false,"canReadSpaces":false,"blogPosts":[],"currentRepoPage":0,"filters":{},"paperView":false}">

AI & ML interests

Large scale distributed AI model training, model parallelisation, low-level GPU acceleration, make GPUs go brrrrr

Recent Activity

HF PRESS
book cover

The Ultra-Scale Playbook: Training LLMs on GPU Clusters

Essential reading for anyone scaling ML infrastructure
The knowledge on how to efficiently scale training to large GPU clusters has been well kept within a handful big industry labs. With this book, we set out to lift the veil and release a comprehensive resource on distributed training.
AUTHORS
Nouamane Tazi, Ferdinand Mom, Haojun Zhao, Phuc Nguyen, Mohamed Mekkouri, Leandro von Werra, Thomas Wolf
AFFILIATION
Hugging Face
PUBLISHED
Jul 30, 2025

This book PDF is accessible with a PRO subscription.

(*If you experience issues downloading the PDF with Chrome try restarting/updating or use a different browser)

The Nanotron team focus on sharing open knowledge and developping open-source libraries for efficient distributed training of large-scale AI models.

Some of its contributions are: