Deprecated: The each() function is deprecated. This message will be suppressed on further calls in /home/zhenxiangba/zhenxiangba.com/public_html/phproxy-improved-master/index.php on line 456 Paper page - L3GO: Language Agents with Chain-of-3D-Thoughts for Generating
Unconventional Objects
Please give a thumbs up to this comment if you found it helpful!
\n
If you want recommendations for any Paper on Hugging Face checkout this Space
\n
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: \n\n@librarian-bot\n\t recommend
\n","updatedAt":"2024-02-16T01:22:05.863Z","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":318,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.7457066178321838},"editors":["librarian-bot"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2402.09052","authors":[{"_id":"65cd7b626f58553a7451821f","user":{"_id":"63142272e29fb2e86d63db0a","avatarUrl":"/avatars/5726b5c40cea190a6cd9907fa596b66b.svg","isPro":false,"fullname":"Yutaro","user":"yyamada","type":"user"},"name":"Yutaro Yamada","status":"extracted_confirmed","statusLastChangedAt":"2024-02-15T04:08:48.923Z","hidden":false},{"_id":"65cd7b626f58553a74518220","user":{"_id":"62ebe6472f8999993a847bb6","avatarUrl":"/avatars/29fb590f8116bdc9c55f7a333e2c4280.svg","isPro":false,"fullname":"Khyathi Raghavi Chandu","user":"khyathi","type":"user"},"name":"Khyathi Chandu","status":"admin_assigned","statusLastChangedAt":"2024-02-15T09:52:34.372Z","hidden":false},{"_id":"65cd7b626f58553a74518221","user":{"_id":"607f666a4ad99100d63ce35c","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/607f666a4ad99100d63ce35c/QxhxnvfeV6efkxwUFHwjI.png","isPro":false,"fullname":"Bill Yuchen Lin","user":"yuchenlin","type":"user"},"name":"Yuchen Lin","status":"claimed_verified","statusLastChangedAt":"2024-02-15T08:26:20.257Z","hidden":false},{"_id":"65cd7b626f58553a74518222","user":{"_id":"625498644f4edf771516b2cb","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1664497251391-625498644f4edf771516b2cb.jpeg","isPro":false,"fullname":"Jack Hessel","user":"jmhessel","type":"user"},"name":"Jack Hessel","status":"admin_assigned","statusLastChangedAt":"2024-02-15T09:48:44.757Z","hidden":false},{"_id":"65cd7b626f58553a74518223","name":"Ilker Yildirim","hidden":false},{"_id":"65cd7b626f58553a74518224","user":{"_id":"64d42729f63b01b7f676b176","avatarUrl":"/avatars/52e54bdd6a1fb6c774a40cd70f3d7925.svg","isPro":false,"fullname":"Yejin Choi","user":"yejinchoinka","type":"user"},"name":"Yejin Choi","status":"admin_assigned","statusLastChangedAt":"2024-02-15T09:49:03.220Z","hidden":false}],"publishedAt":"2024-02-14T09:51:05.000Z","submittedOnDailyAt":"2024-02-15T00:18:05.384Z","title":"L3GO: Language Agents with Chain-of-3D-Thoughts for Generating\n Unconventional Objects","submittedOnDailyBy":{"_id":"60f1abe7544c2adfd699860c","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674929746905-60f1abe7544c2adfd699860c.jpeg","isPro":false,"fullname":"AK","user":"akhaliq","type":"user"},"summary":"Diffusion-based image generation models such as DALL-E 3 and Stable\nDiffusion-XL demonstrate remarkable capabilities in generating images with\nrealistic and unique compositions. Yet, these models are not robust in\nprecisely reasoning about physical and spatial configurations of objects,\nespecially when instructed with unconventional, thereby out-of-distribution\ndescriptions, such as \"a chair with five legs\". In this paper, we propose a\nlanguage agent with chain-of-3D-thoughts (L3GO), an inference-time approach\nthat can reason about part-based 3D mesh generation of unconventional objects\nthat current data-driven diffusion models struggle with. More concretely, we\nuse large language models as agents to compose a desired object via\ntrial-and-error within the 3D simulation environment. To facilitate our\ninvestigation, we develop a new benchmark, Unconventionally Feasible Objects\n(UFO), as well as SimpleBlenv, a wrapper environment built on top of Blender\nwhere language agents can build and compose atomic building blocks via API\ncalls. Human and automatic GPT-4V evaluations show that our approach surpasses\nthe standard GPT-4 and other language agents (e.g., ReAct and Reflexion) for 3D\nmesh generation on ShapeNet. Moreover, when tested on our UFO benchmark, our\napproach outperforms other state-of-the-art text-to-2D image and text-to-3D\nmodels based on human evaluation.","upvotes":18,"discussionId":"65cd7b656f58553a745182f3","ai_summary":"A language agent with chain-of-3D-thoughts (L3GO) enhances 3D object generation for unconventional descriptions by using large language models to compose objects in a 3D simulation environment, outperforming GPT-4 and other models in the UFO benchmark.","ai_keywords":["diffusion-based image generation models","DALL-E 3","Stable Diffusion-XL","chain-of-3D-thoughts (L3GO)","part-based 3D mesh generation","3D simulation environment","Unconventionally Feasible Objects (UFO)","SimpleBlenv","GPT-4V","ReAct","Reflexion","ShapeNet","text-to-2D image","text-to-3D models"]},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"635964636a61954080850e1d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/635964636a61954080850e1d/0bfExuDTrHTtm8c-40cDM.png","isPro":false,"fullname":"William Lamkin","user":"phanes","type":"user"},{"_id":"6064e095abd8d3692e3e2ed6","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1648966381588-6064e095abd8d3692e3e2ed6.jpeg","isPro":true,"fullname":"Radamés Ajna","user":"radames","type":"user"},{"_id":"6538119803519fddb4a17e10","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6538119803519fddb4a17e10/ffJMkdx-rM7VvLTCM6ri_.jpeg","isPro":false,"fullname":"samusenps","user":"samusenps","type":"user"},{"_id":"6303c5041dd5d3c624836739","avatarUrl":"/avatars/7dbc3d6e894c2eed9a2fe4cef7c1ce4a.svg","isPro":false,"fullname":"Ayami I","user":"Ayakinokiki","type":"user"},{"_id":"64747f7e33192631bacd8831","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/64747f7e33192631bacd8831/dstkZJ4sHJSeqLesV5cOC.jpeg","isPro":false,"fullname":"Taufiq Dwi Purnomo","user":"taufiqdp","type":"user"},{"_id":"652d1e2cb31567a8d81d372e","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/652d1e2cb31567a8d81d372e/XuziMYOpPLHU6eJcz68Yi.jpeg","isPro":false,"fullname":"Aakash Khadikar","user":"Aakashk","type":"user"},{"_id":"61868ce808aae0b5499a2a95","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/61868ce808aae0b5499a2a95/F6BA0anbsoY_Z7M1JrwOe.jpeg","isPro":true,"fullname":"Sylvain Filoni","user":"fffiloni","type":"user"},{"_id":"6039478ab3ecf716b1a5fd4d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6039478ab3ecf716b1a5fd4d/_Thy4E7taiSYBLKxEKJbT.jpeg","isPro":true,"fullname":"taesiri","user":"taesiri","type":"user"},{"_id":"6108956e7602f8e9ed8bb5d8","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1672966209331-6108956e7602f8e9ed8bb5d8.png","isPro":false,"fullname":"adakoda","user":"adakoda","type":"user"},{"_id":"64323105faac480f27c87c49","avatarUrl":"/avatars/25e115082ae7d219abd776887d628d3f.svg","isPro":false,"fullname":"JiHwanYoon","user":"ohilikeit","type":"user"},{"_id":"61e7c06064d3c6c929057bee","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/61e7c06064d3c6c929057bee/QxULx1EA1bgmjXxupQX4B.jpeg","isPro":false,"fullname":"蓋瑞王","user":"gary109","type":"user"},{"_id":"633b71b47af633cbcd0671d8","avatarUrl":"/avatars/6671941ced18ae516db6ebfbf73e239f.svg","isPro":false,"fullname":"juand4bot","user":"juandavidgf","type":"user"}],"acceptLanguages":["*"],"dailyPaperRank":3}">
A language agent with chain-of-3D-thoughts (L3GO) enhances 3D object generation for unconventional descriptions by using large language models to compose objects in a 3D simulation environment, outperforming GPT-4 and other models in the UFO benchmark.
AI-generated summary
Diffusion-based image generation models such as DALL-E 3 and Stable
Diffusion-XL demonstrate remarkable capabilities in generating images with
realistic and unique compositions. Yet, these models are not robust in
precisely reasoning about physical and spatial configurations of objects,
especially when instructed with unconventional, thereby out-of-distribution
descriptions, such as "a chair with five legs". In this paper, we propose a
language agent with chain-of-3D-thoughts (L3GO), an inference-time approach
that can reason about part-based 3D mesh generation of unconventional objects
that current data-driven diffusion models struggle with. More concretely, we
use large language models as agents to compose a desired object via
trial-and-error within the 3D simulation environment. To facilitate our
investigation, we develop a new benchmark, Unconventionally Feasible Objects
(UFO), as well as SimpleBlenv, a wrapper environment built on top of Blender
where language agents can build and compose atomic building blocks via API
calls. Human and automatic GPT-4V evaluations show that our approach surpasses
the standard GPT-4 and other language agents (e.g., ReAct and Reflexion) for 3D
mesh generation on ShapeNet. Moreover, when tested on our UFO benchmark, our
approach outperforms other state-of-the-art text-to-2D image and text-to-3D
models based on human evaluation.