Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: \n\n@librarian-bot\n\t recommend

\n","updatedAt":"2024-02-16T01:22:05.863Z","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":318,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.7457066178321838},"editors":["librarian-bot"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2402.09052","authors":[{"_id":"65cd7b626f58553a7451821f","user":{"_id":"63142272e29fb2e86d63db0a","avatarUrl":"/avatars/5726b5c40cea190a6cd9907fa596b66b.svg","isPro":false,"fullname":"Yutaro","user":"yyamada","type":"user"},"name":"Yutaro Yamada","status":"extracted_confirmed","statusLastChangedAt":"2024-02-15T04:08:48.923Z","hidden":false},{"_id":"65cd7b626f58553a74518220","user":{"_id":"62ebe6472f8999993a847bb6","avatarUrl":"/avatars/29fb590f8116bdc9c55f7a333e2c4280.svg","isPro":false,"fullname":"Khyathi Raghavi Chandu","user":"khyathi","type":"user"},"name":"Khyathi Chandu","status":"admin_assigned","statusLastChangedAt":"2024-02-15T09:52:34.372Z","hidden":false},{"_id":"65cd7b626f58553a74518221","user":{"_id":"607f666a4ad99100d63ce35c","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/607f666a4ad99100d63ce35c/QxhxnvfeV6efkxwUFHwjI.png","isPro":false,"fullname":"Bill Yuchen Lin","user":"yuchenlin","type":"user"},"name":"Yuchen Lin","status":"claimed_verified","statusLastChangedAt":"2024-02-15T08:26:20.257Z","hidden":false},{"_id":"65cd7b626f58553a74518222","user":{"_id":"625498644f4edf771516b2cb","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1664497251391-625498644f4edf771516b2cb.jpeg","isPro":false,"fullname":"Jack Hessel","user":"jmhessel","type":"user"},"name":"Jack Hessel","status":"admin_assigned","statusLastChangedAt":"2024-02-15T09:48:44.757Z","hidden":false},{"_id":"65cd7b626f58553a74518223","name":"Ilker Yildirim","hidden":false},{"_id":"65cd7b626f58553a74518224","user":{"_id":"64d42729f63b01b7f676b176","avatarUrl":"/avatars/52e54bdd6a1fb6c774a40cd70f3d7925.svg","isPro":false,"fullname":"Yejin Choi","user":"yejinchoinka","type":"user"},"name":"Yejin Choi","status":"admin_assigned","statusLastChangedAt":"2024-02-15T09:49:03.220Z","hidden":false}],"publishedAt":"2024-02-14T09:51:05.000Z","submittedOnDailyAt":"2024-02-15T00:18:05.384Z","title":"L3GO: Language Agents with Chain-of-3D-Thoughts for Generating\n Unconventional Objects","submittedOnDailyBy":{"_id":"60f1abe7544c2adfd699860c","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674929746905-60f1abe7544c2adfd699860c.jpeg","isPro":false,"fullname":"AK","user":"akhaliq","type":"user"},"summary":"Diffusion-based image generation models such as DALL-E 3 and Stable\nDiffusion-XL demonstrate remarkable capabilities in generating images with\nrealistic and unique compositions. Yet, these models are not robust in\nprecisely reasoning about physical and spatial configurations of objects,\nespecially when instructed with unconventional, thereby out-of-distribution\ndescriptions, such as \"a chair with five legs\". In this paper, we propose a\nlanguage agent with chain-of-3D-thoughts (L3GO), an inference-time approach\nthat can reason about part-based 3D mesh generation of unconventional objects\nthat current data-driven diffusion models struggle with. More concretely, we\nuse large language models as agents to compose a desired object via\ntrial-and-error within the 3D simulation environment. To facilitate our\ninvestigation, we develop a new benchmark, Unconventionally Feasible Objects\n(UFO), as well as SimpleBlenv, a wrapper environment built on top of Blender\nwhere language agents can build and compose atomic building blocks via API\ncalls. Human and automatic GPT-4V evaluations show that our approach surpasses\nthe standard GPT-4 and other language agents (e.g., ReAct and Reflexion) for 3D\nmesh generation on ShapeNet. Moreover, when tested on our UFO benchmark, our\napproach outperforms other state-of-the-art text-to-2D image and text-to-3D\nmodels based on human evaluation.","upvotes":18,"discussionId":"65cd7b656f58553a745182f3","ai_summary":"A language agent with chain-of-3D-thoughts (L3GO) enhances 3D object generation for unconventional descriptions by using large language models to compose objects in a 3D simulation environment, outperforming GPT-4 and other models in the UFO benchmark.","ai_keywords":["diffusion-based image generation models","DALL-E 3","Stable Diffusion-XL","chain-of-3D-thoughts (L3GO)","part-based 3D mesh generation","3D simulation environment","Unconventionally Feasible Objects (UFO)","SimpleBlenv","GPT-4V","ReAct","Reflexion","ShapeNet","text-to-2D image","text-to-3D models"]},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"635964636a61954080850e1d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/635964636a61954080850e1d/0bfExuDTrHTtm8c-40cDM.png","isPro":false,"fullname":"William Lamkin","user":"phanes","type":"user"},{"_id":"6064e095abd8d3692e3e2ed6","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1648966381588-6064e095abd8d3692e3e2ed6.jpeg","isPro":true,"fullname":"Radamés Ajna","user":"radames","type":"user"},{"_id":"6538119803519fddb4a17e10","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6538119803519fddb4a17e10/ffJMkdx-rM7VvLTCM6ri_.jpeg","isPro":false,"fullname":"samusenps","user":"samusenps","type":"user"},{"_id":"6303c5041dd5d3c624836739","avatarUrl":"/avatars/7dbc3d6e894c2eed9a2fe4cef7c1ce4a.svg","isPro":false,"fullname":"Ayami I","user":"Ayakinokiki","type":"user"},{"_id":"64747f7e33192631bacd8831","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/64747f7e33192631bacd8831/dstkZJ4sHJSeqLesV5cOC.jpeg","isPro":false,"fullname":"Taufiq Dwi Purnomo","user":"taufiqdp","type":"user"},{"_id":"652d1e2cb31567a8d81d372e","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/652d1e2cb31567a8d81d372e/XuziMYOpPLHU6eJcz68Yi.jpeg","isPro":false,"fullname":"Aakash Khadikar","user":"Aakashk","type":"user"},{"_id":"61868ce808aae0b5499a2a95","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/61868ce808aae0b5499a2a95/F6BA0anbsoY_Z7M1JrwOe.jpeg","isPro":true,"fullname":"Sylvain Filoni","user":"fffiloni","type":"user"},{"_id":"6039478ab3ecf716b1a5fd4d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6039478ab3ecf716b1a5fd4d/_Thy4E7taiSYBLKxEKJbT.jpeg","isPro":true,"fullname":"taesiri","user":"taesiri","type":"user"},{"_id":"6108956e7602f8e9ed8bb5d8","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1672966209331-6108956e7602f8e9ed8bb5d8.png","isPro":false,"fullname":"adakoda","user":"adakoda","type":"user"},{"_id":"64323105faac480f27c87c49","avatarUrl":"/avatars/25e115082ae7d219abd776887d628d3f.svg","isPro":false,"fullname":"JiHwanYoon","user":"ohilikeit","type":"user"},{"_id":"61e7c06064d3c6c929057bee","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/61e7c06064d3c6c929057bee/QxULx1EA1bgmjXxupQX4B.jpeg","isPro":false,"fullname":"蓋瑞王","user":"gary109","type":"user"},{"_id":"633b71b47af633cbcd0671d8","avatarUrl":"/avatars/6671941ced18ae516db6ebfbf73e239f.svg","isPro":false,"fullname":"juand4bot","user":"juandavidgf","type":"user"}],"acceptLanguages":["*"],"dailyPaperRank":3}">

Papers

arxiv:2402.09052

L3GO: Language Agents with Chain-of-3D-Thoughts for Generating Unconventional Objects

Published on Feb 14, 2024

· Submitted by

AK on Feb 15, 2024

#3 Paper of the day

Upvote

Authors:

Yutaro Yamada ,

Khyathi Chandu ,

Yuchen Lin ,

Jack Hessel ,

Yejin Choi

Abstract

A language agent with chain-of-3D-thoughts (L3GO) enhances 3D object generation for unconventional descriptions by using large language models to compose objects in a 3D simulation environment, outperforming GPT-4 and other models in the UFO benchmark.

AI-generated summary

Diffusion-based image generation models such as DALL-E 3 and Stable Diffusion-XL demonstrate remarkable capabilities in generating images with realistic and unique compositions. Yet, these models are not robust in precisely reasoning about physical and spatial configurations of objects, especially when instructed with unconventional, thereby out-of-distribution descriptions, such as "a chair with five legs". In this paper, we propose a language agent with chain-of-3D-thoughts (L3GO), an inference-time approach that can reason about part-based 3D mesh generation of unconventional objects that current data-driven diffusion models struggle with. More concretely, we use large language models as agents to compose a desired object via trial-and-error within the 3D simulation environment. To facilitate our investigation, we develop a new benchmark, Unconventionally Feasible Objects (UFO), as well as SimpleBlenv, a wrapper environment built on top of Blender where language agents can build and compose atomic building blocks via API calls. Human and automatic GPT-4V evaluations show that our approach surpasses the standard GPT-4 and other language agents (e.g., ReAct and Reflexion) for 3D mesh generation on ShapeNet. Moreover, when tested on our UFO benchmark, our approach outperforms other state-of-the-art text-to-2D image and text-to-3D models based on human evaluation.