Deprecated: The each() function is deprecated. This message will be suppressed on further calls in /home/zhenxiangba/zhenxiangba.com/public_html/phproxy-improved-master/index.php on line 456
Paper page - L3GO: Language Agents with Chain-of-3D-Thoughts for Generating Unconventional Objects
[go: Go Back, main page]

Librarian Bot. I found the following papers similar to this paper.

\n

The following papers were recommended by the Semantic Scholar API

\n\n

Please give a thumbs up to this comment if you found it helpful!

\n

If you want recommendations for any Paper on Hugging Face checkout this Space

\n

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: \n\n@librarian-bot\n\t recommend

\n","updatedAt":"2024-02-16T01:22:05.863Z","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":318,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.7457066178321838},"editors":["librarian-bot"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2402.09052","authors":[{"_id":"65cd7b626f58553a7451821f","user":{"_id":"63142272e29fb2e86d63db0a","avatarUrl":"/avatars/5726b5c40cea190a6cd9907fa596b66b.svg","isPro":false,"fullname":"Yutaro","user":"yyamada","type":"user"},"name":"Yutaro Yamada","status":"extracted_confirmed","statusLastChangedAt":"2024-02-15T04:08:48.923Z","hidden":false},{"_id":"65cd7b626f58553a74518220","user":{"_id":"62ebe6472f8999993a847bb6","avatarUrl":"/avatars/29fb590f8116bdc9c55f7a333e2c4280.svg","isPro":false,"fullname":"Khyathi Raghavi Chandu","user":"khyathi","type":"user"},"name":"Khyathi Chandu","status":"admin_assigned","statusLastChangedAt":"2024-02-15T09:52:34.372Z","hidden":false},{"_id":"65cd7b626f58553a74518221","user":{"_id":"607f666a4ad99100d63ce35c","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/607f666a4ad99100d63ce35c/QxhxnvfeV6efkxwUFHwjI.png","isPro":false,"fullname":"Bill Yuchen Lin","user":"yuchenlin","type":"user"},"name":"Yuchen Lin","status":"claimed_verified","statusLastChangedAt":"2024-02-15T08:26:20.257Z","hidden":false},{"_id":"65cd7b626f58553a74518222","user":{"_id":"625498644f4edf771516b2cb","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1664497251391-625498644f4edf771516b2cb.jpeg","isPro":false,"fullname":"Jack Hessel","user":"jmhessel","type":"user"},"name":"Jack Hessel","status":"admin_assigned","statusLastChangedAt":"2024-02-15T09:48:44.757Z","hidden":false},{"_id":"65cd7b626f58553a74518223","name":"Ilker Yildirim","hidden":false},{"_id":"65cd7b626f58553a74518224","user":{"_id":"64d42729f63b01b7f676b176","avatarUrl":"/avatars/52e54bdd6a1fb6c774a40cd70f3d7925.svg","isPro":false,"fullname":"Yejin Choi","user":"yejinchoinka","type":"user"},"name":"Yejin Choi","status":"admin_assigned","statusLastChangedAt":"2024-02-15T09:49:03.220Z","hidden":false}],"publishedAt":"2024-02-14T09:51:05.000Z","submittedOnDailyAt":"2024-02-15T00:18:05.384Z","title":"L3GO: Language Agents with Chain-of-3D-Thoughts for Generating\n Unconventional Objects","submittedOnDailyBy":{"_id":"60f1abe7544c2adfd699860c","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674929746905-60f1abe7544c2adfd699860c.jpeg","isPro":false,"fullname":"AK","user":"akhaliq","type":"user"},"summary":"Diffusion-based image generation models such as DALL-E 3 and Stable\nDiffusion-XL demonstrate remarkable capabilities in generating images with\nrealistic and unique compositions. Yet, these models are not robust in\nprecisely reasoning about physical and spatial configurations of objects,\nespecially when instructed with unconventional, thereby out-of-distribution\ndescriptions, such as \"a chair with five legs\". In this paper, we propose a\nlanguage agent with chain-of-3D-thoughts (L3GO), an inference-time approach\nthat can reason about part-based 3D mesh generation of unconventional objects\nthat current data-driven diffusion models struggle with. More concretely, we\nuse large language models as agents to compose a desired object via\ntrial-and-error within the 3D simulation environment. To facilitate our\ninvestigation, we develop a new benchmark, Unconventionally Feasible Objects\n(UFO), as well as SimpleBlenv, a wrapper environment built on top of Blender\nwhere language agents can build and compose atomic building blocks via API\ncalls. Human and automatic GPT-4V evaluations show that our approach surpasses\nthe standard GPT-4 and other language agents (e.g., ReAct and Reflexion) for 3D\nmesh generation on ShapeNet. Moreover, when tested on our UFO benchmark, our\napproach outperforms other state-of-the-art text-to-2D image and text-to-3D\nmodels based on human evaluation.","upvotes":18,"discussionId":"65cd7b656f58553a745182f3","ai_summary":"A language agent with chain-of-3D-thoughts (L3GO) enhances 3D object generation for unconventional descriptions by using large language models to compose objects in a 3D simulation environment, outperforming GPT-4 and other models in the UFO benchmark.","ai_keywords":["diffusion-based image generation models","DALL-E 3","Stable Diffusion-XL","chain-of-3D-thoughts (L3GO)","part-based 3D mesh generation","3D simulation environment","Unconventionally Feasible Objects (UFO)","SimpleBlenv","GPT-4V","ReAct","Reflexion","ShapeNet","text-to-2D image","text-to-3D models"]},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"635964636a61954080850e1d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/635964636a61954080850e1d/0bfExuDTrHTtm8c-40cDM.png","isPro":false,"fullname":"William Lamkin","user":"phanes","type":"user"},{"_id":"6064e095abd8d3692e3e2ed6","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1648966381588-6064e095abd8d3692e3e2ed6.jpeg","isPro":true,"fullname":"Radamés Ajna","user":"radames","type":"user"},{"_id":"6538119803519fddb4a17e10","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6538119803519fddb4a17e10/ffJMkdx-rM7VvLTCM6ri_.jpeg","isPro":false,"fullname":"samusenps","user":"samusenps","type":"user"},{"_id":"6303c5041dd5d3c624836739","avatarUrl":"/avatars/7dbc3d6e894c2eed9a2fe4cef7c1ce4a.svg","isPro":false,"fullname":"Ayami I","user":"Ayakinokiki","type":"user"},{"_id":"64747f7e33192631bacd8831","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/64747f7e33192631bacd8831/dstkZJ4sHJSeqLesV5cOC.jpeg","isPro":false,"fullname":"Taufiq Dwi Purnomo","user":"taufiqdp","type":"user"},{"_id":"652d1e2cb31567a8d81d372e","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/652d1e2cb31567a8d81d372e/XuziMYOpPLHU6eJcz68Yi.jpeg","isPro":false,"fullname":"Aakash Khadikar","user":"Aakashk","type":"user"},{"_id":"61868ce808aae0b5499a2a95","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/61868ce808aae0b5499a2a95/F6BA0anbsoY_Z7M1JrwOe.jpeg","isPro":true,"fullname":"Sylvain Filoni","user":"fffiloni","type":"user"},{"_id":"6039478ab3ecf716b1a5fd4d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6039478ab3ecf716b1a5fd4d/_Thy4E7taiSYBLKxEKJbT.jpeg","isPro":true,"fullname":"taesiri","user":"taesiri","type":"user"},{"_id":"6108956e7602f8e9ed8bb5d8","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1672966209331-6108956e7602f8e9ed8bb5d8.png","isPro":false,"fullname":"adakoda","user":"adakoda","type":"user"},{"_id":"64323105faac480f27c87c49","avatarUrl":"/avatars/25e115082ae7d219abd776887d628d3f.svg","isPro":false,"fullname":"JiHwanYoon","user":"ohilikeit","type":"user"},{"_id":"61e7c06064d3c6c929057bee","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/61e7c06064d3c6c929057bee/QxULx1EA1bgmjXxupQX4B.jpeg","isPro":false,"fullname":"蓋瑞王","user":"gary109","type":"user"},{"_id":"633b71b47af633cbcd0671d8","avatarUrl":"/avatars/6671941ced18ae516db6ebfbf73e239f.svg","isPro":false,"fullname":"juand4bot","user":"juandavidgf","type":"user"}],"acceptLanguages":["*"],"dailyPaperRank":3}">
Papers
arxiv:2402.09052

L3GO: Language Agents with Chain-of-3D-Thoughts for Generating Unconventional Objects

Published on Feb 14, 2024
· Submitted by
AK
on Feb 15, 2024
#3 Paper of the day
Authors:
,

Abstract

A language agent with chain-of-3D-thoughts (L3GO) enhances 3D object generation for unconventional descriptions by using large language models to compose objects in a 3D simulation environment, outperforming GPT-4 and other models in the UFO benchmark.

AI-generated summary

Diffusion-based image generation models such as DALL-E 3 and Stable Diffusion-XL demonstrate remarkable capabilities in generating images with realistic and unique compositions. Yet, these models are not robust in precisely reasoning about physical and spatial configurations of objects, especially when instructed with unconventional, thereby out-of-distribution descriptions, such as "a chair with five legs". In this paper, we propose a language agent with chain-of-3D-thoughts (L3GO), an inference-time approach that can reason about part-based 3D mesh generation of unconventional objects that current data-driven diffusion models struggle with. More concretely, we use large language models as agents to compose a desired object via trial-and-error within the 3D simulation environment. To facilitate our investigation, we develop a new benchmark, Unconventionally Feasible Objects (UFO), as well as SimpleBlenv, a wrapper environment built on top of Blender where language agents can build and compose atomic building blocks via API calls. Human and automatic GPT-4V evaluations show that our approach surpasses the standard GPT-4 and other language agents (e.g., ReAct and Reflexion) for 3D mesh generation on ShapeNet. Moreover, when tested on our UFO benchmark, our approach outperforms other state-of-the-art text-to-2D image and text-to-3D models based on human evaluation.

Community

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2402.09052 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2402.09052 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2402.09052 in a Space README.md to link it from this page.

Collections including this paper 3