Deprecated: The each() function is deprecated. This message will be suppressed on further calls in /home/zhenxiangba/zhenxiangba.com/public_html/phproxy-improved-master/index.php on line 456 Paper page - Open-MAGVIT2: An Open-Source Project Toward Democratizing
Auto-regressive Visual Generation
https://github.com/TencentARC/Open-MAGVIT2\n","updatedAt":"2024-09-09T03:34:28.484Z","author":{"_id":"60f1abe7544c2adfd699860c","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674929746905-60f1abe7544c2adfd699860c.jpeg","fullname":"AK","name":"akhaliq","type":"user","isPro":false,"isHf":true,"isHfAdmin":false,"isMod":false,"followerCount":9180,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.745599091053009},"editors":["akhaliq"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/1674929746905-60f1abe7544c2adfd699860c.jpeg"],"reactions":[],"isReport":false}},{"id":"66dfa1c65e4fd4bfeab11891","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":318,"isUserFollowing":false},"createdAt":"2024-09-10T01:32:54.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"This is an automated message from the [Librarian Bot](https://huggingface.co/librarian-bots). I found the following papers similar to this paper. \n\nThe following papers were recommended by the Semantic Scholar API \n\n* [VAR-CLIP: Text-to-Image Generator with Visual Auto-Regressive Modeling](https://huggingface.co/papers/2408.01181) (2024)\n* [Show-o: One Single Transformer to Unify Multimodal Understanding and Generation](https://huggingface.co/papers/2408.12528) (2024)\n* [Lumina-mGPT: Illuminate Flexible Photorealistic Text-to-Image Generation with Multimodal Generative Pretraining](https://huggingface.co/papers/2408.02657) (2024)\n* [Revolutionizing Text-to-Image Retrieval as Autoregressive Token-to-Voken Generation](https://huggingface.co/papers/2407.17274) (2024)\n* [DiffX: Guide Your Layout to Cross-Modal Generative Modeling](https://huggingface.co/papers/2407.15488) (2024)\n\n\n Please give a thumbs up to this comment if you found it helpful!\n\n If you want recommendations for any Paper on Hugging Face checkout [this](https://huggingface.co/spaces/librarian-bots/recommend_similar_papers) Space\n\n You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: `@librarian-bot recommend`","html":"
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
\n
The following papers were recommended by the Semantic Scholar API
Please give a thumbs up to this comment if you found it helpful!
\n
If you want recommendations for any Paper on Hugging Face checkout this Space
\n
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: \n\n@librarian-bot\n\t recommend
\n","updatedAt":"2024-09-10T01:32:54.469Z","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":318,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.7193984985351562},"editors":["librarian-bot"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2409.04410","authors":[{"_id":"66de6c935af9428df34917b4","user":{"_id":"657187de7644d1128571495e","avatarUrl":"/avatars/89412c94fd6136f6680055551de3ddc4.svg","isPro":false,"fullname":"Zhuoyan Luo","user":"RobertLuo1","type":"user"},"name":"Zhuoyan Luo","status":"admin_assigned","statusLastChangedAt":"2024-09-09T08:23:00.548Z","hidden":false},{"_id":"66de6c935af9428df34917b5","user":{"_id":"666c1232a7ca4800af0545d7","avatarUrl":"/avatars/b4713c86ee57f30aae478262634692fe.svg","isPro":false,"fullname":"Fengyuan Shi","user":"shifengyuan","type":"user"},"name":"Fengyuan Shi","status":"admin_assigned","statusLastChangedAt":"2024-09-09T08:26:16.830Z","hidden":false},{"_id":"66de6c935af9428df34917b6","user":{"_id":"640e9762b03f4cd29f58d982","avatarUrl":"/avatars/81da37d628163fe3e094b247c7c3a3b5.svg","isPro":false,"fullname":"Yixiao Ge","user":"yxgeee","type":"user"},"name":"Yixiao Ge","status":"admin_assigned","statusLastChangedAt":"2024-09-09T08:26:23.973Z","hidden":false},{"_id":"66de6c935af9428df34917b7","name":"Yujiu Yang","hidden":false},{"_id":"66de6c935af9428df34917b8","name":"Limin Wang","hidden":false},{"_id":"66de6c935af9428df34917b9","user":{"_id":"63ca3ddc04c979828310bfcb","avatarUrl":"/avatars/615e0d8622950b4408b40d550f02a894.svg","isPro":false,"fullname":"Ying Shan","user":"yshan2u","type":"user"},"name":"Ying Shan","status":"admin_assigned","statusLastChangedAt":"2024-09-09T08:27:34.082Z","hidden":false}],"publishedAt":"2024-09-06T17:14:53.000Z","submittedOnDailyAt":"2024-09-09T02:04:28.478Z","title":"Open-MAGVIT2: An Open-Source Project Toward Democratizing\n Auto-regressive Visual Generation","submittedOnDailyBy":{"_id":"60f1abe7544c2adfd699860c","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674929746905-60f1abe7544c2adfd699860c.jpeg","isPro":false,"fullname":"AK","user":"akhaliq","type":"user"},"summary":"We present Open-MAGVIT2, a family of auto-regressive image generation models\nranging from 300M to 1.5B. The Open-MAGVIT2 project produces an open-source\nreplication of Google's MAGVIT-v2 tokenizer, a tokenizer with a super-large\ncodebook (i.e., 2^{18} codes), and achieves the state-of-the-art\nreconstruction performance (1.17 rFID) on ImageNet 256 times 256.\nFurthermore, we explore its application in plain auto-regressive models and\nvalidate scalability properties. To assist auto-regressive models in predicting\nwith a super-large vocabulary, we factorize it into two sub-vocabulary of\ndifferent sizes by asymmetric token factorization, and further introduce \"next\nsub-token prediction\" to enhance sub-token interaction for better generation\nquality. We release all models and codes to foster innovation and creativity in\nthe field of auto-regressive visual generation.","upvotes":25,"discussionId":"66de6c945af9428df3491914","ai_summary":"Open-MAGVIT2 models, ranging from 300M to 1.5B parameters, achieve state-of-the-art image reconstruction and exploration in auto-regressive models with super-large token vocabularies.","ai_keywords":["auto-regressive image generation","Open-MAGVIT2","MAGVIT-v2 tokenizer","super-large codebook","reconstruction performance","asymmetric token factorization","next sub-token prediction"]},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"62aaaaf55a99fb2669bcd0e3","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1655352046059-noauth.jpeg","isPro":false,"fullname":"GaggiX","user":"GaggiX","type":"user"},{"_id":"65ba471ad88a65abb9328ee2","avatarUrl":"/avatars/956238ce5034091e64d026b0272c4400.svg","isPro":false,"fullname":"Dazhi Jiang","user":"thuzhizhi","type":"user"},{"_id":"620783f24e28382272337ba4","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/620783f24e28382272337ba4/zkUveQPNiDfYjgGhuFErj.jpeg","isPro":false,"fullname":"GuoLiangTang","user":"Tommy930","type":"user"},{"_id":"645b91991c24cd669dd7618a","avatarUrl":"/avatars/7c8a7b7b0668239e96bc160f142e8de9.svg","isPro":false,"fullname":"z","user":"Huye2023","type":"user"},{"_id":"648eb1eb59c4e5c87dc116e0","avatarUrl":"/avatars/c636cea39c2c0937f01398c94ead5dad.svg","isPro":false,"fullname":"fdsqefsgergd","user":"T-representer","type":"user"},{"_id":"65ab7ed7d6b10af911dc8085","avatarUrl":"/avatars/0b033760f3a785ac82a1e913b28e9dec.svg","isPro":false,"fullname":"Tianren Ma","user":"Vivre","type":"user"},{"_id":"64a84de2eb47b3552285ef74","avatarUrl":"/avatars/114e0cc393d0aea9680f3af6d84d6f46.svg","isPro":false,"fullname":"Eni Grand","user":"Enigrand","type":"user"},{"_id":"643b19f8a856622f978df30f","avatarUrl":"/avatars/c82779fdf94f80cdb5020504f83c818b.svg","isPro":false,"fullname":"Yatharth Sharma","user":"YaTharThShaRma999","type":"user"},{"_id":"64513261938967fd069d2340","avatarUrl":"/avatars/e4c3c435f6a4cda57d0e2f16ec1cda6e.svg","isPro":false,"fullname":"sdtana","user":"sdtana","type":"user"},{"_id":"63d4c8ce13ae45b780792f32","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/63d4c8ce13ae45b780792f32/QasegimoxBqfZwDzorukz.png","isPro":false,"fullname":"Ohenenoo","user":"PeepDaSlan9","type":"user"},{"_id":"620c35eece371f5bad535d6e","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1669407156872-620c35eece371f5bad535d6e.jpeg","isPro":true,"fullname":"Andrew Pouliot","user":"darknoon","type":"user"},{"_id":"641b754d1911d3be6745cce9","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/641b754d1911d3be6745cce9/Ydjcjd4VuNUGj5Cd4QHdB.png","isPro":false,"fullname":"atayloraerospace","user":"Taylor658","type":"user"}],"acceptLanguages":["*"],"dailyPaperRank":3}">
Open-MAGVIT2 models, ranging from 300M to 1.5B parameters, achieve state-of-the-art image reconstruction and exploration in auto-regressive models with super-large token vocabularies.
AI-generated summary
We present Open-MAGVIT2, a family of auto-regressive image generation models
ranging from 300M to 1.5B. The Open-MAGVIT2 project produces an open-source
replication of Google's MAGVIT-v2 tokenizer, a tokenizer with a super-large
codebook (i.e., 2^{18} codes), and achieves the state-of-the-art
reconstruction performance (1.17 rFID) on ImageNet 256 times 256.
Furthermore, we explore its application in plain auto-regressive models and
validate scalability properties. To assist auto-regressive models in predicting
with a super-large vocabulary, we factorize it into two sub-vocabulary of
different sizes by asymmetric token factorization, and further introduce "next
sub-token prediction" to enhance sub-token interaction for better generation
quality. We release all models and codes to foster innovation and creativity in
the field of auto-regressive visual generation.