Deprecated: The each() function is deprecated. This message will be suppressed on further calls in /home/zhenxiangba/zhenxiangba.com/public_html/phproxy-improved-master/index.php on line 456 Paper page - SnapGen++: Unleashing Diffusion Transformers for Efficient High-Fidelity Image Generation on Edge Devices
Please give a thumbs up to this comment if you found it helpful!
\n
If you want recommendations for any Paper on Hugging Face checkout this Space
\n
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: \n\n@librarian-bot\n\t recommend
\n","updatedAt":"2026-01-15T01:36:41.578Z","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":318,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.6864010691642761},"editors":["librarian-bot"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2601.08303","authors":[{"_id":"696708d3c5e371f6b235d10c","user":{"_id":"655d812e668b64adf13d6382","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/12tDi8gkhM9s1Go6HSWAb.png","isPro":false,"fullname":"Dongting Hu","user":"timmy11hu","type":"user"},"name":"Dongting Hu","status":"claimed_verified","statusLastChangedAt":"2026-02-11T22:17:26.079Z","hidden":false},{"_id":"696708d3c5e371f6b235d10d","name":"Aarush Gupta","hidden":false},{"_id":"696708d3c5e371f6b235d10e","name":"Magzhan Gabidolla","hidden":false},{"_id":"696708d3c5e371f6b235d10f","name":"Arpit Sahni","hidden":false},{"_id":"696708d3c5e371f6b235d110","name":"Huseyin Coskun","hidden":false},{"_id":"696708d3c5e371f6b235d111","name":"Yanyu Li","hidden":false},{"_id":"696708d3c5e371f6b235d112","name":"Yerlan Idelbayev","hidden":false},{"_id":"696708d3c5e371f6b235d113","name":"Ahsan Mahmood","hidden":false},{"_id":"696708d3c5e371f6b235d114","name":"Aleksei Lebedev","hidden":false},{"_id":"696708d3c5e371f6b235d115","name":"Dishani Lahiri","hidden":false},{"_id":"696708d3c5e371f6b235d116","name":"Anujraaj Goyal","hidden":false},{"_id":"696708d3c5e371f6b235d117","name":"Ju Hu","hidden":false},{"_id":"696708d3c5e371f6b235d118","name":"Mingming Gong","hidden":false},{"_id":"696708d3c5e371f6b235d119","name":"Sergey Tulyakov","hidden":false},{"_id":"696708d3c5e371f6b235d11a","name":"Anil Kag","hidden":false}],"publishedAt":"2026-01-13T07:46:46.000Z","submittedOnDailyAt":"2026-01-14T00:39:31.241Z","title":"SnapGen++: Unleashing Diffusion Transformers for Efficient High-Fidelity Image Generation on Edge Devices","submittedOnDailyBy":{"_id":"6039478ab3ecf716b1a5fd4d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6039478ab3ecf716b1a5fd4d/_Thy4E7taiSYBLKxEKJbT.jpeg","isPro":true,"fullname":"taesiri","user":"taesiri","type":"user"},"summary":"Recent advances in diffusion transformers (DiTs) have set new standards in image generation, yet remain impractical for on-device deployment due to their high computational and memory costs. In this work, we present an efficient DiT framework tailored for mobile and edge devices that achieves transformer-level generation quality under strict resource constraints. Our design combines three key components. First, we propose a compact DiT architecture with an adaptive global-local sparse attention mechanism that balances global context modeling and local detail preservation. Second, we propose an elastic training framework that jointly optimizes sub-DiTs of varying capacities within a unified supernetwork, allowing a single model to dynamically adjust for efficient inference across different hardware. Finally, we develop Knowledge-Guided Distribution Matching Distillation, a step-distillation pipeline that integrates the DMD objective with knowledge transfer from few-step teacher models, producing high-fidelity and low-latency generation (e.g., 4-step) suitable for real-time on-device use. Together, these contributions enable scalable, efficient, and high-quality diffusion models for deployment on diverse hardware.","upvotes":18,"discussionId":"696708d3c5e371f6b235d11b","ai_summary":"An efficient diffusion transformer framework for mobile and edge devices that maintains high-generation quality while reducing computational costs through compact architecture, elastic training, and knowledge-guided distillation.","ai_keywords":["diffusion transformers","DiTs","sparse attention mechanism","elastic training","supernetwork","knowledge-guided distribution matching distillation","step-distillation pipeline","teacher models","high-fidelity generation","low-latency generation"],"organization":{"_id":"668450a2c1cbe5e008ac6515","name":"Snapchat","fullname":"Snapchat Inc.","avatar":"https://cdn-uploads.huggingface.co/production/uploads/648ca58a39d2584ee47efef6/plasFy052q2795odYb6NO.jpeg"}},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"6528a57bf0042c8301d217dc","avatarUrl":"/avatars/b7e1398aec545a0342c05c67c5493c8b.svg","isPro":false,"fullname":"HanSaem Kim","user":"kensaem","type":"user"},{"_id":"63c5d43ae2804cb2407e4d43","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1673909278097-noauth.png","isPro":false,"fullname":"xziayro","user":"xziayro","type":"user"},{"_id":"620783f24e28382272337ba4","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/620783f24e28382272337ba4/zkUveQPNiDfYjgGhuFErj.jpeg","isPro":false,"fullname":"GuoLiangTang","user":"Tommy930","type":"user"},{"_id":"649be88f867d442094248239","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/noauth/NFwa74mbWNEBfHlt82hpp.jpeg","isPro":false,"fullname":"SAMBIT CHAKRABORTY","user":"sambitchakhf03","type":"user"},{"_id":"66b01ee8e53bbad918362856","avatarUrl":"/avatars/293529589a91dd7a95909d66727db224.svg","isPro":false,"fullname":"Anil Kag","user":"anilkagak2","type":"user"},{"_id":"624ac233c04d55ec0f42b11e","avatarUrl":"/avatars/58a9abce945e71a65abc8a54085de6d7.svg","isPro":false,"fullname":"oh sehun","user":"sehun","type":"user"},{"_id":"677272184d148b904333e874","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/noauth/5dUau7gxLk4Wm1TiiJJri.jpeg","isPro":false,"fullname":"Efstathios Karypidis","user":"Sta8is","type":"user"},{"_id":"6343e37f73b4f9cedab1c846","avatarUrl":"/avatars/2638af4626e8a4e3a95f845b94ad94f6.svg","isPro":false,"fullname":"Leheng Li","user":"lilelife","type":"user"},{"_id":"63ce2e251c8a5d1d7d82955f","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/63ce2e251c8a5d1d7d82955f/SJWRp65EtQ0kgxPGDCElR.png","isPro":false,"fullname":"Kimiko","user":"Chat-Error","type":"user"},{"_id":"63f5e5599cbd673030247c26","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/63f5e5599cbd673030247c26/1_YOgqqU8MUdr5lJiQEw2.jpeg","isPro":false,"fullname":"Spike Huang","user":"NEO946B","type":"user"},{"_id":"60c8d264224e250fb0178f77","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/60c8d264224e250fb0178f77/i8fbkBVcoFeJRmkQ9kYAE.png","isPro":false,"fullname":"Adam Lee","user":"Abecid","type":"user"},{"_id":"64276311eb9a0ed86180715b","avatarUrl":"/avatars/76f933cd549f10e5e2db379de235d304.svg","isPro":false,"fullname":"Aliaksandr Siarohin","user":"aliaksandr-siarohin","type":"user"}],"acceptLanguages":["*"],"dailyPaperRank":0,"organization":{"_id":"668450a2c1cbe5e008ac6515","name":"Snapchat","fullname":"Snapchat Inc.","avatar":"https://cdn-uploads.huggingface.co/production/uploads/648ca58a39d2584ee47efef6/plasFy052q2795odYb6NO.jpeg"}}">
An efficient diffusion transformer framework for mobile and edge devices that maintains high-generation quality while reducing computational costs through compact architecture, elastic training, and knowledge-guided distillation.
AI-generated summary
Recent advances in diffusion transformers (DiTs) have set new standards in image generation, yet remain impractical for on-device deployment due to their high computational and memory costs. In this work, we present an efficient DiT framework tailored for mobile and edge devices that achieves transformer-level generation quality under strict resource constraints. Our design combines three key components. First, we propose a compact DiT architecture with an adaptive global-local sparse attention mechanism that balances global context modeling and local detail preservation. Second, we propose an elastic training framework that jointly optimizes sub-DiTs of varying capacities within a unified supernetwork, allowing a single model to dynamically adjust for efficient inference across different hardware. Finally, we develop Knowledge-Guided Distribution Matching Distillation, a step-distillation pipeline that integrates the DMD objective with knowledge transfer from few-step teacher models, producing high-fidelity and low-latency generation (e.g., 4-step) suitable for real-time on-device use. Together, these contributions enable scalable, efficient, and high-quality diffusion models for deployment on diverse hardware.
Proposes efficient diffusion transformers for edge devices via sparse attention, elastic training, and knowledge-guided distillation to achieve high-fidelity, fast on-device image generation.