The following papers were recommended by the Semantic Scholar API
\n- \n
- Mini-DALLE3: Interactive Text to Image by Prompting Large Language Models (2023) \n
- OpenLEAF: Open-Domain Interleaved Image-Text Generation and Evaluation (2023) \n
- MiniGPT-5: Interleaved Vision-and-Language Generation via Generative Vokens (2023) \n
- ObjectComposer: Consistent Generation of Multiple Objects Without Fine-tuning (2023) \n
- WorldSmith: Iterative and Expressive Prompting for World Building with a Generative AI (2023) \n
Please give a thumbs up to this comment if you found it helpful!
\nIf you want recommendations for any Paper on Hugging Face checkout this Space
\n","updatedAt":"2023-10-14T10:28:10.406Z","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":318,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.7118558883666992},"editors":["librarian-bot"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg"],"reactions":[],"isReport":false}},{"id":"65303f12c1b8f265a41e577f","author":{"_id":"63fd1a0d3c880680af42f952","avatarUrl":"/avatars/96e10a263658083abc18ace84d639d0a.svg","fullname":"michael","name":"michaeluffer","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"isUserFollowing":false},"createdAt":"2023-10-18T20:24:50.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"Very interesting idea to use LLM to refine and expand prompt for better image generation. Do you have a demo of this online. Is the code open source?","html":"Very interesting idea to use LLM to refine and expand prompt for better image generation. Do you have a demo of this online. Is the code open source?
\n","updatedAt":"2023-10-18T20:24:50.236Z","author":{"_id":"63fd1a0d3c880680af42f952","avatarUrl":"/avatars/96e10a263658083abc18ace84d639d0a.svg","fullname":"michael","name":"michaeluffer","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.8727509379386902},"editors":["michaeluffer"],"editorAvatarUrls":["/avatars/96e10a263658083abc18ace84d639d0a.svg"],"reactions":[],"isReport":false}},{"id":"6530819da9390a5ddf8dc698","author":{"_id":"630713411801ecc7d2592a7c","avatarUrl":"/avatars/fb36f69f03421c3a2a7f72ba0858fa60.svg","fullname":"Zhengyuan Yang","name":"zyang39","type":"user","isPro":true,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":9,"isUserFollowing":false},"createdAt":"2023-10-19T01:08:45.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"> Very interesting idea to use LLM to refine and expand prompt for better image generation. Do you have a demo of this online. Is the code open source?\n\nThank you for your interest. We are preparing the code and will release it soon. Thanks.","html":"\n\nVery interesting idea to use LLM to refine and expand prompt for better image generation. Do you have a demo of this online. Is the code open source?
\n
Thank you for your interest. We are preparing the code and will release it soon. Thanks.
\n","updatedAt":"2023-10-19T01:08:45.853Z","author":{"_id":"630713411801ecc7d2592a7c","avatarUrl":"/avatars/fb36f69f03421c3a2a7f72ba0858fa60.svg","fullname":"Zhengyuan Yang","name":"zyang39","type":"user","isPro":true,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":9,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.9104544520378113},"editors":["zyang39"],"editorAvatarUrls":["/avatars/fb36f69f03421c3a2a7f72ba0858fa60.svg"],"reactions":[],"isReport":false}},{"id":"659d90e8962de309f01d2859","author":{"_id":"659d906b962de309f01d0be8","avatarUrl":"/avatars/f3f4f41613f9e5d6ae84ef6068a5bff9.svg","fullname":"Kaiky T. Campos","name":"itsinuxx","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"isUserFollowing":false},"createdAt":"2024-01-09T18:31:04.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"Do you have a release date? I'm dying to test this technology! ","html":"Do you have a release date? I'm dying to test this technology!
\n","updatedAt":"2024-01-09T18:31:04.854Z","author":{"_id":"659d906b962de309f01d0be8","avatarUrl":"/avatars/f3f4f41613f9e5d6ae84ef6068a5bff9.svg","fullname":"Kaiky T. Campos","name":"itsinuxx","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.9688445925712585},"editors":["itsinuxx"],"editorAvatarUrls":["/avatars/f3f4f41613f9e5d6ae84ef6068a5bff9.svg"],"reactions":[],"isReport":false}},{"id":"659d9100a2e1b655457ea413","author":{"_id":"659d906b962de309f01d0be8","avatarUrl":"/avatars/f3f4f41613f9e5d6ae84ef6068a5bff9.svg","fullname":"Kaiky T. Campos","name":"itsinuxx","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"isUserFollowing":false},"createdAt":"2024-01-09T18:31:28.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"> > Very interesting idea to use LLM to refine and expand prompt for better image generation. Do you have a demo of this online. Is the code open source?\n> \n> Thank you for your interest. We are preparing the code and will release it soon. Thanks.\n\nWhen are you going to release it?","html":"\n\n\n\nVery interesting idea to use LLM to refine and expand prompt for better image generation. Do you have a demo of this online. Is the code open source?
\nThank you for your interest. We are preparing the code and will release it soon. Thanks.
\n
When are you going to release it?
\n","updatedAt":"2024-01-09T18:31:28.831Z","author":{"_id":"659d906b962de309f01d0be8","avatarUrl":"/avatars/f3f4f41613f9e5d6ae84ef6068a5bff9.svg","fullname":"Kaiky T. Campos","name":"itsinuxx","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.934421956539154},"editors":["itsinuxx"],"editorAvatarUrls":["/avatars/f3f4f41613f9e5d6ae84ef6068a5bff9.svg"],"reactions":[],"isReport":false}},{"id":"65e774b6eaafd233d96452ac","author":{"_id":"659d906b962de309f01d0be8","avatarUrl":"/avatars/f3f4f41613f9e5d6ae84ef6068a5bff9.svg","fullname":"Kaiky T. Campos","name":"itsinuxx","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"isUserFollowing":false},"createdAt":"2024-03-05T19:38:30.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"Please release a demo where we can try it.","html":"Please release a demo where we can try it.
\n","updatedAt":"2024-03-05T19:38:30.836Z","author":{"_id":"659d906b962de309f01d0be8","avatarUrl":"/avatars/f3f4f41613f9e5d6ae84ef6068a5bff9.svg","fullname":"Kaiky T. Campos","name":"itsinuxx","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.9744260311126709},"editors":["itsinuxx"],"editorAvatarUrls":["/avatars/f3f4f41613f9e5d6ae84ef6068a5bff9.svg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2310.08541","authors":[{"_id":"65294587019a20ed658ecc38","user":{"_id":"630713411801ecc7d2592a7c","avatarUrl":"/avatars/fb36f69f03421c3a2a7f72ba0858fa60.svg","isPro":true,"fullname":"Zhengyuan Yang","user":"zyang39","type":"user"},"name":"Zhengyuan Yang","status":"admin_assigned","statusLastChangedAt":"2023-10-13T16:03:49.250Z","hidden":false},{"_id":"65294587019a20ed658ecc39","user":{"_id":"641404e74e5305c14f269b3c","avatarUrl":"/avatars/0dbe862cde7e4bef25bedf860584d8c1.svg","isPro":false,"fullname":"Jianfeng Wang","user":"amsword8","type":"user"},"name":"Jianfeng Wang","status":"admin_assigned","statusLastChangedAt":"2023-10-13T16:04:13.692Z","hidden":false},{"_id":"65294587019a20ed658ecc3a","user":{"_id":"63db16fff03c3d71ef397206","avatarUrl":"/avatars/bfb7e0d730b7d03302799d5d2828d97d.svg","isPro":false,"fullname":"Linjie Li","user":"linjieli222","type":"user"},"name":"Linjie Li","status":"admin_assigned","statusLastChangedAt":"2023-10-13T16:04:28.570Z","hidden":false},{"_id":"65294587019a20ed658ecc3b","user":{"_id":"6298fd95b58e71e2ac9f3ad8","avatarUrl":"/avatars/7d34644d537bc5c17cf1e4ce4095355c.svg","isPro":false,"fullname":"Kevin Lin","user":"kevinlin311tw","type":"user"},"name":"Kevin Lin","status":"claimed_verified","statusLastChangedAt":"2023-10-16T07:24:49.165Z","hidden":false},{"_id":"65294587019a20ed658ecc3c","user":{"_id":"6669f406d21cf7d39b1d98ba","avatarUrl":"/avatars/5ec270cb8bcb68786379d8a8bc411aaa.svg","isPro":false,"fullname":"CC Lin","user":"cclin10","type":"user"},"name":"Chung-Ching Lin","status":"claimed_verified","statusLastChangedAt":"2024-06-13T09:33:21.391Z","hidden":false},{"_id":"65294587019a20ed658ecc3d","name":"Zicheng Liu","hidden":false},{"_id":"65294587019a20ed658ecc3e","user":{"_id":"6413521d4e5305c14f22e110","avatarUrl":"/avatars/a6f8d0573e678f79bc3c0b7897b818ce.svg","isPro":false,"fullname":"Lijuan Wang","user":"Lijuan","type":"user"},"name":"Lijuan Wang","status":"admin_assigned","statusLastChangedAt":"2023-10-13T16:05:31.275Z","hidden":false}],"publishedAt":"2023-10-12T17:34:20.000Z","submittedOnDailyAt":"2023-10-13T11:56:35.743Z","title":"Idea2Img: Iterative Self-Refinement with GPT-4V(ision) for Automatic\n Image Design and Generation","submittedOnDailyBy":{"_id":"60f1abe7544c2adfd699860c","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674929746905-60f1abe7544c2adfd699860c.jpeg","isPro":false,"fullname":"AK","user":"akhaliq","type":"user"},"summary":"We introduce ``Idea to Image,'' a system that enables multimodal iterative\nself-refinement with GPT-4V(ision) for automatic image design and generation.\nHumans can quickly identify the characteristics of different text-to-image\n(T2I) models via iterative explorations. This enables them to efficiently\nconvert their high-level generation ideas into effective T2I prompts that can\nproduce good images. We investigate if systems based on large multimodal models\n(LMMs) can develop analogous multimodal self-refinement abilities that enable\nexploring unknown models or environments via self-refining tries. Idea2Img\ncyclically generates revised T2I prompts to synthesize draft images, and\nprovides directional feedback for prompt revision, both conditioned on its\nmemory of the probed T2I model's characteristics. The iterative self-refinement\nbrings Idea2Img various advantages over vanilla T2I models. Notably, Idea2Img\ncan process input ideas with interleaved image-text sequences, follow ideas\nwith design instructions, and generate images of better semantic and visual\nqualities. The user preference study validates the efficacy of multimodal\niterative self-refinement on automatic image design and generation.","upvotes":18,"discussionId":"6529458b019a20ed658eccad","ai_summary":"Idea to Image uses iterative multimodal refinement with GPT-4V to enhance text-to-image generation by refining prompts based on model characteristics, leading to higher-quality images.","ai_keywords":["GPT-4V","multimodal iterative self-refinement","text-to-image (T2I) models","self-refining trials","interleaved image-text sequences","design instructions"]},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"630412d57373aacccd88af95","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1670594087059-630412d57373aacccd88af95.jpeg","isPro":true,"fullname":"Yasunori Ozaki","user":"alfredplpl","type":"user"},{"_id":"630713411801ecc7d2592a7c","avatarUrl":"/avatars/fb36f69f03421c3a2a7f72ba0858fa60.svg","isPro":true,"fullname":"Zhengyuan Yang","user":"zyang39","type":"user"},{"_id":"633b71b47af633cbcd0671d8","avatarUrl":"/avatars/6671941ced18ae516db6ebfbf73e239f.svg","isPro":false,"fullname":"juand4bot","user":"juandavidgf","type":"user"},{"_id":"648eb1eb59c4e5c87dc116e0","avatarUrl":"/avatars/c636cea39c2c0937f01398c94ead5dad.svg","isPro":false,"fullname":"fdsqefsgergd","user":"T-representer","type":"user"},{"_id":"63971d6a4352e45362dd6c9c","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/63971d6a4352e45362dd6c9c/Y5Sh1gF_CB3FdseZ9KBMP.jpeg","isPro":false,"fullname":"Ciprian Cimpan","user":"ciprian42","type":"user"},{"_id":"650c5386dceec9cd4adffce8","avatarUrl":"/avatars/05054327a81fee8dec8c495f4176caa4.svg","isPro":false,"fullname":"meongwoenjang","user":"pollawon","type":"user"},{"_id":"63feebe688b9695964c3cc6b","avatarUrl":"/avatars/7ba8b420c64d753cc81352cf5ab51b92.svg","isPro":false,"fullname":"Anthony W Figueroa","user":"THEFIG","type":"user"},{"_id":"6032802e1f993496bc14d9e3","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6032802e1f993496bc14d9e3/w6hr-DEQot4VVkoyRIBiy.png","isPro":false,"fullname":"Omar Sanseviero","user":"osanseviero","type":"user"},{"_id":"62fa10a98e137d7c4b82a21e","avatarUrl":"/avatars/fa916f831da823371e38f0a7068f875e.svg","isPro":false,"fullname":"drawry","user":"vitram","type":"user"},{"_id":"63438dc7b9454d65a2f28495","avatarUrl":"/avatars/c5028bbf56b4ed4bc6c2ae92ae38198a.svg","isPro":false,"fullname":"xu","user":"lunfan","type":"user"},{"_id":"651e1cf05916451e473c61d7","avatarUrl":"/avatars/d18261e9528031da0ee1d23f0a1cb4b5.svg","isPro":false,"fullname":"Nicolas Schlaepfer","user":"nschpNicolas","type":"user"},{"_id":"63fd1a0d3c880680af42f952","avatarUrl":"/avatars/96e10a263658083abc18ace84d639d0a.svg","isPro":false,"fullname":"michael","user":"michaeluffer","type":"user"}],"acceptLanguages":["*"],"dailyPaperRank":3}">Idea2Img: Iterative Self-Refinement with GPT-4V(ision) for Automatic Image Design and Generation
Abstract
Idea to Image uses iterative multimodal refinement with GPT-4V to enhance text-to-image generation by refining prompts based on model characteristics, leading to higher-quality images.
We introduce ``Idea to Image,'' a system that enables multimodal iterative self-refinement with GPT-4V(ision) for automatic image design and generation. Humans can quickly identify the characteristics of different text-to-image (T2I) models via iterative explorations. This enables them to efficiently convert their high-level generation ideas into effective T2I prompts that can produce good images. We investigate if systems based on large multimodal models (LMMs) can develop analogous multimodal self-refinement abilities that enable exploring unknown models or environments via self-refining tries. Idea2Img cyclically generates revised T2I prompts to synthesize draft images, and provides directional feedback for prompt revision, both conditioned on its memory of the probed T2I model's characteristics. The iterative self-refinement brings Idea2Img various advantages over vanilla T2I models. Notably, Idea2Img can process input ideas with interleaved image-text sequences, follow ideas with design instructions, and generate images of better semantic and visual qualities. The user preference study validates the efficacy of multimodal iterative self-refinement on automatic image design and generation.
Community
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- Mini-DALLE3: Interactive Text to Image by Prompting Large Language Models (2023)
- OpenLEAF: Open-Domain Interleaved Image-Text Generation and Evaluation (2023)
- MiniGPT-5: Interleaved Vision-and-Language Generation via Generative Vokens (2023)
- ObjectComposer: Consistent Generation of Multiple Objects Without Fine-tuning (2023)
- WorldSmith: Iterative and Expressive Prompting for World Building with a Generative AI (2023)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
Very interesting idea to use LLM to refine and expand prompt for better image generation. Do you have a demo of this online. Is the code open source?
Very interesting idea to use LLM to refine and expand prompt for better image generation. Do you have a demo of this online. Is the code open source?
Thank you for your interest. We are preparing the code and will release it soon. Thanks.
Do you have a release date? I'm dying to test this technology!
Very interesting idea to use LLM to refine and expand prompt for better image generation. Do you have a demo of this online. Is the code open source?
Thank you for your interest. We are preparing the code and will release it soon. Thanks.
When are you going to release it?
Please release a demo where we can try it.
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper